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NOTE ON OPTIMUM GROUP SIZE FOR 
PROGENY TESTS” 


A. W. Norpskoe 


Towa State University of Science and Technology 
Ames, Iowa, U.S.A. 


Robertson [1957] pointed out that two factors determine the ac- 
curacy and extent of a progeny testing program. The first is the physical 
facilities available limiting the total number of progeny which can be 
tested in any one generation. The second is the number of sires that 
will be selected each generation. He showed that solution of the 
following equation for p leads to the optimum number of sires to be 
tested in each generation: 


2px — 2 
px 


tal= 


l= 
I 


(1) 


where 


= 
ll 


total number of animals to be tested which is limited by 
physical facilities, 
= a fixed number of selected sires from a total of s tested 
sires, 
a = heritability function of the trait in question = (4 — h’)/h’, 


p = S/s = fraction of the sires selected of those tested, 

z = ordinate of the normal curve at the point of truncation 
defined by the area p, 

x = dz/dp=abscissa of the normal curve at the point defined 
by p. 


Rendel [1959] pointed out that Robertson’s solution for optimizing 
sire progeny tests does not hold strictly in the case of half-sib family 
selection. In the present note, attention is directed towards the sire 
testing aspect of Robertson’s paper. His solution is confined to the 
special case where each dam is limited to one tested offspring and does 
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not fully account for the effect, on optimum breeding composition, 
of the hierarchical structure of a sire progeny group. This results, 
as in the case of poultry, when each sire has several mates and each 
mate has several offspring. The question may be raised, therefore, 
as to the optimum number of sires to test for arbitrary numbers of 
offspring per dam (n) or for arbitrary numbers of mates per sire (d). 
Taking into account both d and n should give greater generality to 
Robertson’s solution. 


The expected superiority of the S sires chosen is: 
AG = 5 Os (2) 


where 
R = regression of the breeding value of the tested sires on their 
progeny means, 
o, = standard deviation of the sire progeny means. 
The quantities z and p are defined as previously. 
The standard deviation among the sire progeny means may be 


given in terms of the variance components obtained from a hierarchical 
classification: 


[AB 
= (3) 


A = component of variance among individuals full sib to each 
other, 


B = component of variance among full sib families belonging 
to a sire group, i.e., dam variance, 

C = component of variance among sire groups of half sibs, i.e., 
sire variance. 


The regression, R, is also the “Repeatability” of a sire’s breeding 
performance. This is equivalent to the intra-class correlation, 


R = C/e; (4) 


where C is determined solely from genetic differences among sires. 
Substitution of equations (3) and (4) into (2) gives: 


pO, Pp A B (5) 


We assume equal numbers of progeny (n) for all dams, equal numbers 


‘ 
‘ 
where 
a 
zC 
AG = 
Pp 
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of mates (d) for all sires tested and that N = sdn is a fixed number. 
This leads to two possible solutions: (1) we may obtain an optimum s 
for fixed values of n, or alternatively (2) we may obtain an optimum 
s for fixed values of d. We shall deal first with Case 1. 

Case 1. Recalling that p = S/s, that N = dn S/p and d = kp/n 
where k = N/S is Robertson’s ‘‘testing ratio,”’ 


p p + nB 
ip n [A + +C ie 


which may be written as, 


(7) 
where 


The maximum value of AC is then given by Robertson’s equation 
(1) where 


a = (A + nB)/C. 


Resolving the variance components in terms of heritability, we take 
B and C each to be estimates of { the genetic variance, G. Letting 
P= A+ B+ C define the phenotypic variance, then A = P — G/2, 
and 


a= (P 3G + InG)/iG. 


Heritability of individual differences is defined as the ratio h? = G/P, 
then, 


a = (4 — 2h? + mh’)/h’. 
Thus, the principal result is that, AG is a maximum for p when 
2 
S \4 — 2h° + nh Qp z — 


As Robertson showed, the right hand side of the equation, being 
a function of p, is easily calculated from tables of the normal distribution. 
For n = 1 then, 


h?/(4 — 2h? + nh?) = h?/(4 — h’) = 1/a 


which makes equation (8) identical to Robertson’s equation (1). 


wi 
AG = (6) 
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Employing a graph of equation (8) corresponding to Robertson’s 
Figure 1 shows that the optimum p (and accordingly the optimum s 
sires to test) varies for different values of n. For example, letting 
N = 1000, S = 5, and h’ = .1, the optimum s = 41, 40, 39 and 37 
for n = 1, 5, 10 and 20, respectively. Since N = sdn is fixed, this 
would require that the number of mates (d) per sire be 24.4, 5, 2.6 
and 1.4, respectively.“” 

Genetic gain, AG, is maximum when n = 1, hence, testing more 
than one progeny per dam lowers AG. However, in the above example 
for n = 10, the loss in AG compared with that for n = 1 is only 5 per 
cent. This can be verified by substitution of appropriate values into 
equation (6) or (7). It is evident, therefore, for the usual situation 
when sire testing is relevant (i.e., heritability for the trait in question 
is low) that testing one offspring per dam has little to recommend it 
if more are available, as in the case of poultry. 

Case 2. The alternative solution is to obtain the maximum AG for 
specified numbers of mates per sire. We obtain AG in terms of p and 


d, thus, 
p B A B ®) 
Vetaté 


At AG maximum, 


NB+dC_Nd+1_ _ 1 Qpr—2 
S dA S d 


Equation (10) shows that optimum p (and optimum s) varies for 
each particular d. For N = 1000, S = 5, and h’ = .1, the optimum 
s = 44 and 42 ford = 5 and 25, respectively. Maximum AG is reached 
when d is made sufficiently large, (i.e. 25) allowing n to equal one. 
Yet, the loss of AG for d = 5 compared with d = 25 is only about 
3 per cent, which can be verified by substitution of appropriate values 
into equation (9). This means that increasing the number of mates 
per sire much above 5 is not a critical issue when the heritability is 
as low as .1, and when the total number of progeny per sire (dn) is 
kept reasonably constant. 

Perhaps one might assume on intuitive grounds that equations (8) 
and (10) are exactly equivalent, since N = sdn is fixed. Thus, using 
equation (8) for a particular n, say n, , we solve for, say s, , and obtain 
d, = N,/s,n, . If then we take d = d, and solve for s, using equation 


(10) 


The fractions arise since sn is divided into a constant N = 1000. 
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(10), we find that s, and s. are not necessarily the same. The reason 
is that these two equations determine maxima along two different lines 
at right angles to one another in the coordinate system defined by 
log s, log d, and log n. 

Further, since no sharp optima in AG are reached in a sire testing 
program for varying numbers of mates or varying numbers of offspring 
per dam when the heritability is low, a more realistic and satisfactory 
solution to optimum breeding composition would need to take into 
account the economic values and costs of the testing program. Thus, 
it is not necessarily true that the optimum number of offspring per 
dam (n) is unity, though purely on the basis of statistical efficiency 
the optimum is this. 

Perhaps a number of different approaches to this problem is possi- 
ble. These would depend on the particular set of economic factors 
unique to each kind of breeding enterprise, cf., dairy cattle, swine, 
poultry. In principle, some function of AG giving total economic 
value, Y, needs to be found while a cost factor, X, would need to be 
attached to each of the s sires tested. The solution would be to maxi- 
mize the difference, Y — sX, with respect to s. 


SUMMARY 


Robertson’s [1957] solution for optimum number of sires to test 
in a progeny testing scheme has been extended to take into account 
the number of offspring (m) per dam and the number of mates (d) per 
sire. Although genetic gain (AG) is maximum when n = 1, the loss 
is quite small ‘for n as large as 10 when heritability is low. Also, in 
the case of low heritability, increasing d much above 5 is not critical, 
because no sharp optimum in AG is reached. 
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: PLEIOTROPISM AND THE GENETIC VARIANCE 
AND COVARIANCE 


C. J. MopE 

; Department of Mathematics, Moniana State College, Bozeman, Montana, U.S.A. 
AND 

H. F. RoBinson 

i. Depariment of Statistics, North Carolina State College, 

; Raleigh, North Carolina, U.S.A. 


INTRODUCTION AND THEORY 


In classical genetics many genes are known to have manifold effects, 
i.e., the gene seems to affect unrelated characters. An example of such 
a gene is the “vestigial gene” in Drosophila which affects not only 
‘the bristles and wings but also fecundity. Examples of similar genes 
in other organisms abound in the literature. When a gene has manifold 


; effects, its action is called pleiotropic. 


In quantitative genetics, just as in qualitative genetics, it is easy 
to conceive of a gene affecting many characteristics. We shall there- 
fore consider a random mating population with respect to one segre- 
gating locus at which there is an arbitrary number of alleles and 
suppose two characters X and Y are observed. If the genes act pleio- 
tropically, to any genotype, A,A; , there correspond two genotypic 
values X,; and Y,; , one for the character X and one for the character Y. 

Let the genotypic values X;; and Y,; be deviations from their 
respective means. Then if p; and p, represent the frequencies of genes 
A, and A, in the population, the X’s and Y’s satisfy the conditions 
= Doss = 0, where summation extends over 
all alleles. We now make the following set of definitions. The addi- 
tive effects of genes with respect to the characters in question are 


defined as 
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Because of symmetry of the genetic mechanism a* = a* and a; = a; 
ifz = j. The dominance deviations are defined as 


= — a? — of, = Vis — — 


The components of genetic variance and covariance are defined as 
follows: The total genetic variances and covariance with respect to 
the two characteristics are defined as 


The additive genetic variances and covariance are defined as 

= Depot + = t+ pia; , 

and the dominance variances and covariance are defined as 


We then see the following are identities: 
2 2 2 2 2 2 
= + ; Cg > Ga + op = + Tpp+ 


It is thus possible to speak of the additive and dominance com- 
ponents of the genetic covariance just as we speak of additive and 
dominance components of the genetic variances. 

The above arguments may be generalized in a natural way to a 
random mating population in which n loci are segregating with an 
arbitrary number of alleles at each locus. The reader is referred to 
Kempthorne [1957] and elsewhere for the partition of the genetic 
variance in such a population into the additive, dominant, and epi- 
static components. We state here without proof, although a proof 
would not be difficult to construct, that the genetic covariance may 
be partitioned in the same way. In fact, the genetic covariance will 
contain the same finite number of terms as the variance. Thus if we 
let ogg+ be the total genetic covariance, o4,4+ the additive covariance, 
Tnp+ the dominance covariance, o444+,+ the additive X additive co- 
variance, the additive X dominance, ¢ppp+p- the dominance 
X dominance covariance and so on we may write 


= + Tpp+ + OAAA*A* + OAAA*A*DD* + Tppn+ ne + 


Irom the above components of genetic variance and covariance 
a number of parameters may be defined which when estimated, throw 
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light on the nature of the underlying genetic mechanism. Three such 
parameters are the genetic, genotypic, and phenotypic coefficients of 
correlation. The underlying genetic mechanism causing a linear as- 
sociation between two characters may be due to pleiotropy, linkage, 
or both. The genetic coefficient of correlation is defined by 


= 


the ratio of the additive genetic covariance to the geometric mean of 
the components of additive genetic variance. 
The genotypic coefficient of correlation is defined by 


= 


and involves the total genetic variance and covariance. ‘The pheno- 
typic coefficient of correlation is calculated from the total genetic 
variance and covariance plus an environmental variance and covari- 
ance. This parameter is defined as 


Oppe/Tpope 


The environmental variance is usually defined relative to some ex- 
perimental design, so we defer the definition of this quantity to a later 
section. 

A fourth parameter of interest is the measure of the average degree 
of dominance which is defined as 

a= V202,/0°, 
the square root of the ratio of twice the dominance variance to the 
additive variance. For a detailed discussion of the assumption under- 
lying the estimation of this parameter and its genetic interpretation, 
the reader is referred to Comstock and Robinson [1948]. 

If an experimenter wishes to obtain estimates of the above genetic 
parameters, it is necessary to perform an experiment. A common 
experiment is to take a random sample of males and mate them to a 
random sample of females, using each female once and only once. The 
reason for designing an experiment in such a way is that the variances 
and covariances of male means as well as female means within males 
may be expressed in terms of the components of genetic variance and 
covariance. A number of authors have obtained expressions for the 
variances of the above means in terms of the genotypic covariances 
between relatives. In short it may be shown that the variance of the 
mean of all offspring of a given male is equal to the covariance of half 
sibs, and the variance of the mean of the offspring of an individual 
female mated to a given male is equal to the covariance of full sibs 
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minus the covariance of half sibs. However, the arguments used in 
obtaining covariances between relatives are not applicable when one 
is dealing with the genetic covariance between different characteristics. 
We shall therefore derive the variances and covariances of the means 
in question by direct calculations. The arguments that follow are 
essentially a generalization of those of Comstock and Robinson [1948]. 

Consider a random mating population with respect to n segregating 
loci and an arbitrary number of alleles at each locus. In the interests 
of standard notation, we shall follow Kempthorne and let the super- 
script stand for the locus and the subscript stand for a particular 
allele at a locus. With this convention, an arbitrary genotype may be 
symbolized as 


a=l1 


Choose a male at random from the population and let his genotype 
be 


. 
a=l1 


If the mating is random and the population is in linkage equilibrium, 
the probability of selecting such a male is 


IT 


Here p*, and p%, represent the frequencies of alleles Af, and At, . The 
gametic array of the male is 


is a a 

2" I] (AS. + 
a=1 

Next pick a female at random and let her genotype be 


a=1 


If this female comes from the same population, the probability of 
selecting this female is 
I] 
a=1 
If the male and female are mated, the genotypic array of the offspring 


is formed from the product of the gametic arrays of the parents and 
and may be symbolized as 


2 
n 
n he 
n | 
n 
2 
i 
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1 - a a a a 
2?" I] + At + 
Or, we may express it more fully as 


1 °. a a a a a a a a 
2?" IT + A’. Aj, + + 
Upon expansion of the above product, we obtain a sum containing 
2" terms. A typical term is of the form 
II fe ? 
a=l1 
where m, and f, stand for the allele contributed by the male and female 
respectively. 
The mean of the offspring of the mating with respect to character 
Y is obtained by substituting the genotypic values for the genotypic 
symbols and is 


1 


The mean of the offspring with respect to character X is 


1 


In the above expression the sum extends over all offspring and 
a(m.f.) is short for 2" subscripts, a pair of subscripts representing each 
segregating locus. 

The mean of all offspring of male 


I] 
a=l 
with respect to character Y is 
1 ” a a 
Mu > 2°" I] PucP vo 
The mean of all offspring of the male in question with respect to character 
X is 
1 
wwe = om DTT 


The sum in both means extend over all segregating loci and all alleles 
contributed by the females mated to the male under consideration. 
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We seek the variance and covariance of male means and the variance 
and covariance of female means within males. The variances of the 
male means with respect to the two characteristics are given by 


var (M) = E(uy), var (M*) = E(uiy.). 
The covariance of the male means with respect to the two characters is 
cov (M, M*) = E(umua). 


The variances and covariances of female means within males are 
given by 


var (F) = E(ur) — E(u), var (F*) = E(u.) — E(u), 
cov (F, F*) = E(ururs) — 


We wish to express these variances and covariances in terms of the 
components of genetic variance and covariance. We begin by con- 
sidering the terms in yp and wy . First consider the population with 
respect to a single segregating locus. If the male contributes alleles 
A, and A, and the female alleles A, and A, , the mean of the offspring 
expressed in terms of the genotypic values is 


Ur = + Fu + + 


Multiplying by p.p, , the zygotic frequency of a particular female and 
summing over all possible alleles, we find the mean of all offspring 
arising from a particular male is 


or 


bu = 2(2Y,. + 2Y,,). 


Here we let the conventional dot stand for summation over a subscript. 

If the population were considered with respect to two segregating 
loci, one would find that up contains sixteen terms and that py would 
reduce to a sum containing four terms of the form 


in = + + + 


The subscripts r, , s; , 72 , and s, are associated with alleles A}, and A}, 
contributed by the male at the first locus and alleles A?, and A?, contri- 
buted by the male at the second locus. The dots again stand for sum- 
mation over all alleles contributed by the females mated to the male. 

In general when n loci are segregating uy» contains 2" terms, and ny 
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reduces to a sum containing 2” terms each with a coefficient of 2". 
Hence in general 


1 


where the sum extends over 2” terms, 7(m,.) is short for 2” subscripts, 
and the dot represents summation over all alleles contributed by the 
females mated to the male. By repeating the argument we find 


1 


In order to express the variances and covariances under consideration 
in terms of the components of genetic variance and covariance, it is 
convenient to express ur and wy in terms of the genetic effects. If 
the population is segregating at a single locus, a genotypic value may 
be expressed as a linear combination of the additive effects, a; and 
a, , and the dominance deviations 6;; . Thus 


Y,, =a, +a; + 6,;. 


If the population is segregating at two loci, any genotypic value, Y;;.: , 
may be written as 


Y =a; ta; toa ta, + 6; + + (aa) ix + (aa) ;; + (aq) jx 
+ (a) j1 + (a) (a6) + (a5) + (6) + (68) 


where the a’s are the additive effects, the 6’s the dominance, the (aa)’s 
the additive X additive effects, the (aé)’s the additive X dominance 
effects, and (66) the dominance X dominance effect. Similar expressions 
may be written down for X,;; and X,;,: . 

If a particular male contributes alleles A, and A, , we find the 
mean of all offspring arising from this male expressed in terms of the 
genetic effects in the single locus case is 


Mu >= 3(a, + a@,). 


Continuing in the same way, we find that if the male contributes 
alleles A?, and A‘, (a = 1, 2), the mean of all offspring of this male is 


uu = 3[2(,, ta,, tar, + +- @a)..-, + (00) 
+ (aa) + (aa), 


The variance of the male means is obtained by averaging over all 
possible males. In the single locus case, the variance of the male means 
is 
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var (M) = E(uu) = 404, 

and in the two loci case the variance of the male means is 
2 1 
var (M) = 46, + 
16 
since 
E(a,, +a,, = 
+ (ae) + (aa) + = TAA 


and all terms are uncorrelated. 


When n loci are segregating we find each a;, occurs in 3 of the 2” 
terms or terms, each (aa);,;,, in terms, each (aaa) 
in 2”-* terms and so on. Therefore, when n loci are segregating, the 
variance of the male means is 


1 1 1 1 


By repeating the argument first with respect to X then with X and 
Y jointly it may be shown that 


1 


1 1 
var (M*) = 16 74*4° + G4 ree + 


1 1 1 
cov (M, M*) = 4 + 16 + 


We next focus attention on ur , the mean of the offspring of a parti- 
cular male and female. Again we wish to find E(u;), E(u;.), and 
E(ururs) in terms of the components of genetic variance and covariance. 
If the population is segregating at a single locus and if the male contri- 
butes alleles A, and A, and the female alleles A, and A, , then pp ex- 
pressed in terms of the genetic effects is 


Br = 3[2(@, + a.) + + + + + + 


If two loci are segregating and the male contributes alleles A*, and 
A‘, and the female alleles A?, and A$, (a = 1, 2), we find uy expressed 
in terms of the genetic effects is 


y dt 
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+ 4{(@a),,-, + + + + 3 similar terms 
+ 4(6,.., + 6,,.,) + 3 similar terms 

+ 7 similar terms + (56),,.,-.4, + 15 similar terms]. 


In the above means the terms are arranged in such a way that the 
expected value of the square of each term in parenthesis is equal to a 
variance component. Thus E(a, + a,)* = oj and so on. Squaring 
and taking expectations, we find the expected value of u; in the single 
locus case is 


= 304 + 
In the two loci case the expected value of yu; is 


= (1280% + + + + 16050) 


In general when n loci are segregating, the variance components 
will have coefficients as shown below. 


Variance Component Coefficient 
os 
Qin-r—28 


In general then 


1 
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Similarly, 


= + fone + + open, 


and 
1 


Finally, by subtracting E(u3,) from E(u;) and so on we obtain 
the variance and covariance of the female means within males: 


var (F) = hos + op +3 44 7444 


var (F*) = + 3. + 4 open. 


1 
cov (F, F*) = + + Casacae + 


For completeness, we list the variances and covariance of male 
means with respect to the two characteristics: 


var (M) = ton +35 6 744 +a + + 


var (M*) = eee + 


oun. 


1 1 
cov (M, M*) = + 16 744494" + + T(Aaeyn. 


EXPERIMENTAL DESIGN AND RELIABILITY OF ESTIMATES 


Design I of Comstock and Robinson is an experiment similar to 
that described in the preceding section. The experimental material 
is produced from matings among plants of the F, generation of a cross 
between inbred lines or varieties. A random sample of sm males is 
chosen and each male is mated to a random sample of n females. No 
female is used more than once. The experimental material is made 
up of the offspring of these smn matings and is divided into s sets. 
Each set thus forms a distinct unit of the experiment and is planted 
in a classical randomized complete block design, with mn entries and 
r replications, the offspring of m males each mated to n females forming 
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a set. Observations are taken on about ten individual plants in a plot 
and the plot means are recorded. 

If we take measurements on two characters X and Y, the mean of 
a particular plot may be represented by the linear models 


=e ts, t mat fir + 
Xie = + 8F + + mi + fh + eh, 
k=1,2,--- ,m; 1=1,2,--- ,n 


where the y’s are the population means, the s’s are the effects of the 
ith set, the r;;’s are the effects of the jth replicate in the 7th set, the 
m,,’s are the effects of the kth male in the 7th set, the fi,.’s are the 
effects of the /th female mated to the kth male in the 7th set, and the 
are random errors. 

From the nature of the experimental material, it seems reasonable 
to assume that the males and females are sampled from an indefinitely 
large population, and assigned to the sets at random. That is from 
genetic theory a very large number of males and females is conceivable. 
We shall further assume that the m’s, f’s, and e’s are bivariate normal 
variables jointly distributed around means of zero with variances and 
covariances (07 , , mm), (07 ANA (0% , , Geew) TESPEC- 
tively. The s’s and the 7’s will be regarded as fixed so that 


= List = 0, and = = 0. 


The appropriate analysis of variance and covariance for the design is 
given in Table I. 


TABLE I 
ANALYSIS OF VARIANCE AND COVARIANCE 
Source df. x? ry y? 
Sets s—1 | Szy -| Syy 
Replications a(r — 1) 1 
Males s(m — 1) Msz | Mey | My 
Females sm(n — 1) Fy 
Remainder s(mn —1) | Eee Exy Ey, 


Letting the dot represent summation over a subscript, the various 
sums of squares for Y are calculated in the following ways. 


| 
4 
i 
| 
= 


PLEIOTROPISM 


529 


r2 


7 rmn ikl Tm 


rmn 


The various sum of squares for X may be obtained by interchanging 
X and Y. 


The corresponding expressions for the cross-products are as follows: 


rmn srmn—’ 
Ray = mn rmn 
Xie. Vin. 
ikl r ik 


ikl rmn 


Under the assumption that the males and females are randomly 
and independently sampled and their offspring are assigned to the sets 
and replications at random, it is only a matter of straightforward 
algebra to show that the mean squares have the following expectations. 


E(S,,) = + ro; + rnom + rmno, , 
Eiht.) = + + me?, + mne? , 
E(M,,) = o2 + ro; + , 


1 2 2 1 
sm(n — 1) EP = +19; s(mn — 1)(r — 1) 


E(E,,) 


Note, we define 


- si/(s -1),, ris/str — i). 


+2 
y? y? y? y? eer 
M i.k. F i.kl i.k. | 
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The expected mean squares for the X-variable of course have the 
same forms and may be written down by replacing the above sub- 
script with starred subscripts and interchanging X and Y. 

The expectations of the mean cross-products are listed below: 


3 = + TO + TNO mm* 5 
s(r — 1) E(R,,) = TO TNO mine 
1 
in — = Oeee + TNO mm* 
1 1 


E(F,,) = Gece + TOsse E(£,,) 


sm(n — 1) s(mn — 1)(r — 1) 


Here we define 


= 


From the theory of the previous section, ¢2 = var (M), o?. = 
var (M*), of = var (F), oj = var (F*), onme = cov (M, M*), and 
osse = cov (F, F*). If we let the lower case letters stand for the mean 
squares or products, i.e., 


1 


and so on, we see the var (M*) is estimated by 


Similarly, the estimates of var (M*), var (F), var (F*), cov (MM*), 
and cov (F, F*) are given by 


2 


Provided the epistatic components of the genetic variances and 
covariance are zero, we see 4s; , 4s°,. , and 4s,,,+ estimate the additive 
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| 
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variances and covariances, and 


2 
Sp 


46; — 85) + — thy 
Spe = A(sj- [(n + I) Ness); 


Spp+ = Simm*) = 4 [(n + I) fey Nez 


estimate the components of dominance variance and covariance. 

It should be noted that the presence of epistatic variance will always 
inflate the estimates of the additive and dominance components of 
variance. The estimate of the additive covariance, used in estimating 
the genetic coefficient of correlation, will be inflated or deflated depend- 
ing on the sign and relative magnitudes of the epistatic components 
of covariance. In the event the signs and magnitudes of the components 
of covariance are such that all of ogg. vanishes except o44+ , the genetic 
correlation coefficient will still be under-estimated since the denominator 
is always inflated when epistatic variance is present. 

From the definition of the genetic coefficient of correlation, it is 
easy to see the parameter is estimated by 


= Sune/V ° 


In the absence of epistasis, 


estimates the genotypic coefficient of correlation. Again if epistasis 
is present, the above ratio will not estimate the genotypic coefficient 
of correlation but some modified form, modified in the sense that s,s, 
s; , and s‘. will not contain the total genetic variance but only a fraction 
of it. 

If o: , o% , and o,,. are considered the environmental components 
of the phenotypic variance and covariance, the estimate of 0, the pheno- 
typic coefficient of correlation, is in the absence of epistasis given by 


AS pre + Cry 
V (485 + + 
Again the presence of epistasis will modify the above estimate of the 
coefficient of phenotypic correlation. 


Finally, 


a= My — NC yy] 


My — | 


| 
531 
q 
a 
t 
\ 
5 
7 
4 
2, 


532 BIOMETRICS, DECEMBER 1959 


provides an estimate of the average degree of dominance with respect 
to character Y, since in Design I there are only 2 alleles per locus and 
all gene frequencies are }. A similar expression may be written down 
for character X. 

An important question is the reliability of the estimates of the 
genetic variance and covariance and the other functions derived from 
them. Reeve [1955] gave expressions for the variance of a parameter 
called the genetic correlation coefficient, but his arguments were not 
framed in terms of the analysis of variance and covariance. 

When a parameter is a function of moments, ¢ (m, , --- , m), 
an approximate expression for its sampling variance is 


var (y) = (22) var (m,) + cov (m; , m;-). 


am; 


Similarly, the covariance of two functions, g, and ¢2 is given by 


cov , ¢2) = var (m,) + cov (m; , m;). 
The reader is referred to Kendall [1945] and Cramér [1946] for the 
conditions under which such formulas are valid and a discussion of 
their limitations. 
We now apply the above formulas to the problem at hand. Suppose 
¢ is a correlation coefficient of the type given above so that 


Moe2/ VM M22 


where m2 , ™,, , and m,,. are some functions of moments. Then the 
variance of ¢ is 
(my) 4 var (m,,) , var (mz) 
2 2 
Myo 4m, 


var (yg) = 


_ COV (M2, M1) COV (Miz , M2) COV(Mi , 
M22 


Similarly if g = V2m,/m, , i.e., a function of the form used in 
estimate d, the variance of ¢ is 


(m,) 4 var (ms) _ cov 


4m? 4m; 2m,m, 


In particular if = 480 that my. = Same, Mi, = Se, = 
the variances and covariance in var (4), apart from division by r°n’, 
are 
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var = var (m,,) + var (f,,), var (s.) = var(m,,) + var (f,,.), 
cov (Suns s.) = COT (m., ? m,,) + cov (fas 
cov , Sus) = cov (m,, , m,,) + cov (fre frs)- 


Expression for var (s%.) and cov (s... , may be written down by 
an appropriate interchange of subscripts. The variances and covariances 
involving s; , 87 , and $;s- will have the same form, since they are linear 
combinations of independent mean squares and cross-products. 

From the above expressions, we see our problem reduces to that of 
finding expressions for the variance of a mean square, the variance 
of a mean product, the covariance of a mean square and a mean pro- 
duct, and the covariance of mean squares. It follows from the assump- 
tion of normality and random sampling that the sum of squares and 
mean products for males, females, and the remainder are distributed 
independently according to Wishart’s distribution. 

Suppose we let a;; stand for a mean square or mean product with 
P degrees of freedom. If t = j, a,; is a mean square and if 1 ¥ j, aj; 
is a mean product. Then the covariance of a,; and a, can be shown 
to be 


E(a;; — — o4:) = + 


where the o’s are the expected values of the corresponding a’s. 
From the general case, we get the particular cases of interest. If 
it = k and j = I, we obtain the variance of a mean product, 


var (4;;) = + 
Proceeding in the same way we obtain the variance of a mean square, 


the covariance of a mean square and a mean product, and the covari- 
ance of the mean squares. They are listed below in the order given. 


var (a;,) = 203,/P, cov (a,; , @;;) = 20;,0;;/P, cov (a;; , a;;) = 


Unbiased estimates of these variances and covariances are obtained 
if we substitute the sample value of the mean squares or products for 
their expected values and divide by P + 2. 


NUMERICAL EXAMPLE 


The data to be presented are for plant and ear height and are taken 
from a study with corn conducted by Robinson and his co-workers 
in 1951. The progenies making up the experimental material were 
produced in accordance with Design I and originated from plants in 
the F, generation of a cross between inbred lines, C121 * NC7. In 
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this particular case s = 16, m = 4,n = 4, andr = 2. The analysis 
of variance and covariance is presented in Table 2. Note there is one 


missing plot, reducing the degrees of freedom for the remainder by 
one. 


TABLE 2 
Desien I 
ANALYSIS OF VARIANCE AND COVARIANCE FOR PLANT AND Ear HEIGHT* 
Source d.f. an a2 Q22 
ry yy 
Sets 15 | 235.9540 84.5967 53.8533 
Replications 16 28.2498 14.7096 12.5226 
Males 48 77.6481 35.5160 38.6442 
Females 192 30.6799 15.0266 11.4386 
Remainder 239 10.0099 4.1128 4.0039 


*z — plant height, y — ear height. 


Since the variance of any estimate is some linear combination of 
the variances and covariances of the mean squares and products in 
Table 2, the first step in the calculations consists of finding the variances 
and covariances of the quantities in question. From the formulae for 
the variances and covariances of the mean squares and mean products 
and Table 2, we find 


2(77-6431)? 
50 
Continuing in this way, the desired variances and covariances were 
calculated. The results are presented in Table 3. 


var (m,,) = = 241.1138. 


TABLE 3 
VARIANCES AND COVARIANCES OF MEAN SQuARES AND MEAN Propucts 
| var(a22) | var(ai2) | cov(ai2 , | Cov(az22, diz) | cov(an , 
Source yy zy zz, ry yy, ry yy 
Males 241.1138 | 59.7350 | 85.2368 | 110.3028 54.8994 50.4554 
Females 9.7036 | 1.3488 | 2.9729 4.7528 1.7728 |> 2.3278 
Remainder 0.8316 | 0.1330} 0.2365 0.3416 0.1366 0.1404 
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The procedure for estimating a genetic parameter and calculating 
its sampling variance will be illustrated using the genetic coefficient 
of correlation. From the definitions of this parameter and Table 2 
we have 


on 2.5612 
(5.8704)(3.4007) 
We next calculate the variance of 4. Toward this end we find 
var (Same) = 1.3783, var (sn+) = 3.9190, 
var = 0.9544, COV (Sas , 8m) = 0.8247. 


Upon substitution of these quantities and the appropriate s’s into the 
formula for the variance of a function like that of 7, we obtain 


var (n) = 0.04847. 
Hence, the estimate of » together with its standard error is 


4 = 0.5732 + 0.2202. 


= 0.5732. 


lollowing the same procedure of calculation, we obtain estimates of 
the other parameters of interest. The estimates of the parameters of 
interest together with their standard errors are given in Appendix I. 

From inspection of Appendix I, the experimenter may gain some 
insight into the genetic systems governing the determination of plant 
and ear height in the populations under consideration. According to 
theory, a male component of variance or covariance contains only 
additive effects plus additive interactions. A female component of 
variance or covariance contains both additive and dominance inter- 
actions. A difference in the magnitude of the male and female com- 
ponents of variance or covariance, therefore, reflects the presence of 
dominance variance or covariance. 

I’rom Appendix I we see s*,. and sj. , the male and female components 
for plant height, differ appreciably, indicating the presence of domi- 
nance variance and perhaps some epistatic variance. The male and 
female components for ear height, s; and s‘ , are clearly within a standard 
error of each other, suggesting the absence of dominance variance. 
As we might expect, the male and female components of covariance, 
Smme and Sze , Seem to differ, suggesting the importance of dominance 
in the covariation of the two characteristics. 

The measure of the average degree of dominance gives the ex- 
perimenter insight into the kind of dominance present. For plant 
height we find @ = 1.2757, pointing to overdominance. It should be 
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noted, however, that this estimate is within one standard error of & = 1, 

the value & would assume if the genes in question were completely 
dominant. The estimate @ = 0.4315 for ear height is in incomplete 
dominance range. This finding is in agreement with those above, i.e., 
it points to the lack of dominance. 

The coefficients of correlation are in some sense a measure of the 
commonness of the genes governing the determination of two char- 
acteristics. That is, if two characteristics have no genes in common, 
we would expect them to be uncorrelated. Not only are the two char- 
acteristics correlated phenotypically, 6 = 0.8330, but the genetic systems 
underlying the determination of the two characteristics seem to be 
correlated, # = 0.5732 and ¢ = 0.8874. Although the genetic systems 
seem to be correlated or have a common genetic basis, the genes ap- 
parently act in different ways. In the case of plant height, dominance 
seems to be the rule; whereas in the case of ear height, the genes seem 
to act in an additive fashion. 


SUMMARY 


The concept of genetic variance was extended to genetic covariance. 
Under the assumption genes may act in a pleiotropic fashion, it was 
shown that the genetic covariance could be partitioned in exactly the 
same way as the variance. The genetic, genotypic, and phenotypic 
coefficients of correlation were defined. A method of estimating the 
above parameters and calculating their variances was presented, using 
Design I of Comstock and Robinson. A numerical example was given 
in which a number of parameters including those above were estimated 
and interpreted within the framework of the theory. 
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APPENDIX I 
ESTIMATES OF PARAMETERS AND STANDARD ERRORS 


Male Components of Variance and Covariance: 
Sue = 5.8704 + 1.9796, s,, = 3.4007 + 0.9769, Same = 2.5612 + 1.1740. 
Female Components of Variance and Covariance: 
sj- = 10.3350 + 1.6229, s} = 3.7174 + 0.6087, 8,,. = 5.4569 + 0.8975. 
Coefficients of Correlation: 
4 = 0.5732 + 0.2202, ¢ = 0.8874 + 0.0366, 6 = 0.8332 + 0.0300. 


Measures of Average Degree of Dominance: 
Plant Height: @ = 1.2757 + 0.5247. 
Ear Height: @ = 0.4315 + 0.8615. 
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A SIGNIFICANTLY EXTREME DEVIATE IN DATA WITH 
A NON-SIGNIFICANT HETEROGENEITY CHI SQUARE’ 


W. F. BopMER 


Department of Genetics, University of Cambridge 
Cambridge, England 


1. Introduction 


It has been emphasized by Yates [1948] that the usual x’ calculated 
to test heterogeneity in an r X s contingency table, covers all forms of 
departure from proportionality and so may be insensitive in detecting 
departures of any particular specified type. The important general 
problem of subdividing such heterogeneity x”’s into components of 
interest has been described in detail for a number of cases by Fisher 
[1936-54] and its general mathematical justification considered by 
Irwin [1949] and Lancaster [1949]. The possibility of a non-significant 
heterogeneity x’, containing a significant component for a particular 


‘type of departure from proportionality does not seem to be as widely 


recognized as in the parallel situation in the analysis of variance. 

A situation of this type arose in the analysis of data on the pro- 
portion of pin plants (genotype ss) among segregating progenies of 
open pollinated homostyle plants (genotype S’s or S"S") of Primula 
vulgaris. These frequencies are the result of a mixture of self-fertili- 
zation of the parent homostyle and cross-fertilization with other homo- 
styles, the cross pin X homostyle being illegitimate (Bodmer [1958)]). 
Homostyles giving rise to segregating progenies must be of the geno- 
type Ss. Hence the expected proportion of pins among progenies 
which are the result of self-fertilization only is }, and among those 
which are the result of cross-fertilization only is 3. The overall pro- 
portion may vary from } to 3 according to the amount of cross-fertili- 
zation and differences in the observed proportion of pins for different 
years will indicate differences in the amount of cross-fertilization in 
these years. 

The nine observed proportions for the years 1946-55 are given in 
Table 1. The overall x’ for deviation from a 1:3 ratio is highly signifi- 
cant (p < 0.1%), showing that cross-fertilization did occur. The 
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heterogeneity x’ is 15.1275 for eight degrees of freedom which is just 
less than the corresponding value of 15.507 for the five per cent point. 
It is easily seen that the major contributor to the heterogeneity is the 
1955 proportion which is the only one greater than } and therefore 
showing no evidence of any cross-fertilization having taken place in 
the corresponding year. A test for the extreme frequency of such a 
set of observed binomial frequencies is given in the next section. When 
applied to this case it gives a significance level of less than two per 
cent for the difference between the extreme frequency and the others 
where this relates to the specific hypothesis that the 1955 proportion 
represents a year in which no or very little crossing occurred, approxi- 
mately the same high frequency of crossing occurring in all the other 
years. 


TABLE 1 
OBSERVED PROPORTIONS OF PIN PLANTS FoR YEARS 1946-55 

Year Proportion xi for deviation Contribution to 

of pins from 1:3 heterogeneity x? 
1946 4/34 3.1757 1.6043 
1947 17/104 4.1538 1.1208 
1948 4 47/257 6.1752 0.7987 
1950 37/212 6.4402 1.2380 
1951 36/171 1.4210 0.0275 
1952 10/54 1.2097 0.1352 
1953 29/164 4.6828 0.8203 
1954 26/134 2.2383 0.1061 
1955 119/452 0.4248 9.2766 
Total | 325/1582 16.7560 15.1275 


2. An approximate test for the extreme frequency 


Suppose that we have observed a set of binomial frequencies p; = 
Write 


p= = 2/n 


and E(p;) = p for all values of 7. We can then write the contribution 
of the ith frequency to the heterogeneity x’ of k — 1 degrees of freedom 
am 


xi = (x; — — p) = np; — t=1---k. 
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Now transform the observed frequencies by the angular transformation 
p; = sin’@;. Then if the n, are large, the 6; will be distributed in- 
dependently and approximately as normal deviates with mean @ and 
variance } n; where p ~ sin’@. Under these conditions the maximum 
likelihood estimate for @ is the weighted mean 


= n;6;/n 


and we have j ~ sin’ 6 so that 
xi ~ — 0)”. 


Mckay [1935] and Nair [1948] have obtained the distribution for 
the extreme deviate from the sample mean in a sample from a normal 
population whose variance is known. If the n; are all equal, 


(0, — 8) ~x 


where 6, is the extreme angular deviate, will have this distribution. 
For moderate variation in the n; the distribution of u can be taken 
as approximately that derived by Mckay and Nair. Nair [1948] has 
tabulated the percentage points of this distribution for various values 
of k, and his table is reproduced in Biometrika Tables for Statisticians 
(Table 25). A simple approximate test for the extreme frequency is 
thus given by comparing the square root of its contribution to the 
heterogeneity x’ with the percentage points tabulated by Nair [1948] 
for the corresponding value of k. It should be noted that the values 
given by Nair are appropriate for a one-tailed test, when it is known 
in which direction an extreme deviate is to be expected. For a two- 
tailed test therefore, when this knowledge is lacking, the corresponding 
significance levels must be doubled. 


3. Application of the Test 


The contributions of the observed proportions of pins to the hetero- 
geneity x’ are given in Table 1 alongside the original data. The contri- 
bution of the 1955 proportion is 9.2766, and the square root of this, 
3.046, entered in Nair’s tables of the probability integral for k = 9, 
gives P = 0.56%. Hence this extreme proportion differs from the 
others at a significance level of about 1.12%. The value of u obtained 
by applying the angular transformation to the observed data was 
2.989, giving a significance level of 1.36%. The heterogeneity x* for 
the years 1946-54 only is 2.3590 for seven degrees of freedom, showing 
that the excessively high proportion observed in 1955 entirely accounts 
for the inflated overall heterogeneity x’. 
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If in 1955 we had observed only 114 pins out of 452 instead of 119, 
the overall heterogeneity x’ would have been 11.96 for eight degrees 
of freedom giving a probability of 16%. However the proportion 
114/452 = 0.2522 still shows no evidence for cross-fertilization having 
occurred. Its contribution to the heterogeneity x’ is 6.977 giving a 
value higher than that for the five per cent point for a two-tailed test. 
Thus a significantly high extreme frequency might easily have been 
overlooked if judgment had been based only on the overall hetero- 
geneity x’. 

It should be noted that such a test for the extreme frequency is 
not intended as a simple alternative to the usual test for heterogeneity. 
The use of the test implies the testing of the specific hypothesis that the 
1955 proportion represents a year in which no or very little crossing 
occurred as opposed to the other years representing more or less the 
same proportion of crossing, this being a possibility with some bio- 
logical relevance. The occurrence of a year in which no crossing occurred 
can only be recognized by the observation of an unusually high pro- 
portion of pins. Such situations are quite common, at least in the 
analysis of genetic data. A good example is provided by the analysis 
of data relating to affinity, which is a linkage type association between 
characters on different chromosomes (M. E. Wallace [1958]). In order 
to find such an association it is sometimes necessary to search for 
an individual giving an unusually high proportion of either re-combi- 
nants or non-recombinants of the characters concerned, in a back- 
cross mating (see loc. cit. p. 232-3). Such individuals can only be 
recognized by the data they produce and then to test the reality of the 
association some such test as the one given is needed. 

It would seem therefore that in general care must be taken in 
assuming a body of data to be homogeneous only on the basis of a 
non-significant heterogeneity x’. If departures of a specific type are 


to be expected then these must also be tested by an appropriate test 
of significance. 
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COMBINING UNBIASED ESTIMATORS 
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1. Introduction. In applied statistics the problem of combining two 
unbiased estimators often arises. If the estimators have unequal vari- 
ances, than the problem of how to weight the estimators for a best 
combined linear estimator is not straightforward if the ratio of the 
variances is unknown. For example, if @ and &@ are independent 
unbiased estimators of a with variances o; and o; respectively, then 
the linear combined estimator which is unbiased with minimum vari- 
ance is 


& = (038, + 018,)/(o; + 03). 
The variance of @ is equal to 
+ a3). 


In general the o; are not known so this estimate cannot be used. 
If c is any constant such that 0 < c < 1, then &,c + &(1 — c) isa 
linear unbiased estimator of a. It is well known that for any fixed 
value of c the variance of &,c + &(1 — c) is greater than either o; 
or o; for some values of o{ and o; . That is to say, there is no constant 
c such that the estimator @,c + @,(1 — c) has a smaller variance than 
does @, or & , for all possible values of oj and a3 . 

This means that no matter what constant c is used to weight &, 
and & , the value of the variance ratio may be such that it is better to 
use &, alone or & alone to estimate a. 

We will make use of the following definition: Let & , & be in- 
dependent unbiased estimators of a with variances o{; and o} , respectively. 
Let & = c&; + (1 — c)& ; (0 < ¢ < 1) be a linear combined estimator 
with weights c and 1 — c. If the variance of & is less than or equal to 
a; and less than or equal to o; for all values of o{ and a} , then & will be 
called a uniformly better unbiased estimaior of a (uniformly better than 
or &2). 
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As we stated above there is no constant weight which gives rise to 
uniformly better unbiased estimators. This suggests that we might 
use random weights. 

The purpose of this paper is to exhibit a set of random variables 
which can be used as weights on 4, and @, so that the weighted estimator 
is a uniformly better unbiased estimator. 

This problem is particularly important in combining inter-block 
and intra-block estimators in the incomplete block designs. Many 
statisticians give the advice that inter-block information should not 
be used unless the number of blocks in the experiment is “large” [1] 
[2]. This is due to the fact that the weights used in combining inter- 
block and intra-block estimators are subject to sampling variations. 
In this paper we will state the block and treatment sizes for which inter- 
block information should always be used. 


2. Combining Estimators in the Two-Group Problem. Let x be distri- 
buted as a normal variable with mean u and variance o{/n, . Let y 
be distributed as a normal variable with mean yu and variance o;/ne . 
Let m,si/o{ be distributed as a chi-square variable with m, degrees 
of freedom, and let m,s;/o3 be similarly distributed with m, degrees 
of freedom. Suppose all random variables are independent. If o%/n; 
are known, then the minimum variance unbiased estimator of y is 


(n,x03 + neyo.) /(nios + 203). 


As we stated above there exists no constant c such that re + y(1 — c) 
is a uniformly better unbiased estimator of u. We will replace o; by 
s; in the above combined estimator and prove the following: 


Theorem 1. Under the above conditions on the random variables x, y, si , 
and s; , a necessary and sufficient condition that the quantity 

is an unbiased estimator of u which is uniformly better than either x or y 
is that m, and m, are both larger than nine. 


Proof: Since all random variables are independent, the conditional 
expectation of » can be taken. That is to say, we will take the expected 
value of uw in the conditional distribution of z and y holding sj and s; 
fixed. This we will denote by E,.,(@ | sj , s;) and is equal to 


+ + N28}). 


We then take the expectation of this quantity with respect to sj and s3 . 
But this is simply the expectation of a constant » and hence equals 


he 
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u. So we have proved that f is unbiased. Now for the variance of 
a. We get 


If we let y = , 83/(nis> + nes), we get (y, x, y are mutually inde- 
pendent), 


var (a) = Efy(x — + (1 — vy — = — 
+ 2E[y(1 — — wy — »)) + — — 
This second term is zero so this leaves us 
var (a) = Efy*(« — u)*] + — y)*(y — »)*] = — 
+ — — = + — 
Now 
Evy = + = E[1/{1 + 
= + = E[1/(1 + av)*] 
where 
= and v = (1) 
Similarly we get 
E((1 — y)*] = + = E[1/{1 + 
= Efi/{1 + (1/av)}*] = Efa’v’/(1 + av)"). 
This gives 
var fi = (0;/n,)E[1/(1 + av)?] + (03/n.)E[a’v?/(1 + av)’). (2) 


Two cases must be considered; i.e. /n, < and o3/n2 < oj/n, . 


If o{/n, < o3/n., then O < a < 1, and we need to show that 
var fi < o;/n,. We get 


var = + av)?} + (1/a){a’v?/(L + avy }] 
= + av*)/(1 + av)’). 


To prove the necessary part of the theorem we need to find the values 
of m, amd mz, such that 


E[(1 + av*)/(1 + av)*] < 1. (3) 


We see that v is distributed as Snedecor’s F with m, degrees of freedom 
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in the numerator and m, degrees of freedom in the denominator. If 
o3/n2 < oj/n, , then 1 < a < @ and we can write equation (2) as 


av 


+ a)/(1 + 
If we let v = 1/w and d = 1/a, we get 
var = (02/n2)E[{(1/d’w*) + (1/d)}/{1 + (1/dw)}*] 
= (0;/n2)E[(1 + dw*)/(1 + dw)’). 
So we must find the value of m, and m, such that 
+ dw*)/(1 + dw)*] <1 (4) 


for all d where 0 < d < 1. But w is distributed as Snedecor’s F with 
m, degrees of freedom in the numerator and m, degrees of freedom in 
the denominator. 

Since we do not know whether o//n, is larger or smaller than o2/n: , 
we must find the value of m, and m, such that (3) and (4) are both 
satisfied. 

From (3) we see that we must find values of m, and m, such that the 
following inequality is satisfied for all a in the interval 0 < a < 1. 


(™ 
2 (m,/2)-1 
Ms (1+ av)” v 


The integral is a function of m, , m2 , and a, say f(a, m, , m2) or simply 
f(a). The problem then is to find the values of m, and m, such that 
f(a) < 1 for all a between zero and one. If m, > 4, we can differentiate 
under the integral sign and interchange the operations of integration 
and taking a limit on a. In what follows we will assume m, > 4. 

Now if a = 0, the integral is equal to one; i.e. f{(0) = 1. The function 
f(a) is a continuous function of a for 0 < a < 1 and differentiable. 
We will examine the derivative of f(a) at the point a = 0. 


Now 
f(a) = a | 1 +a) 
da + a)’ 


f'(0) = EW’ — 2). 


and so 


da. 
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But 
(m, + 2)m: _ 2m, 
(mz — 2)(m, — 4)m, (m, — 2) 
so 
— (m +2)m. _ 
£0) = m, — 2 Ee — 4) 2| 


Now f(0) = 1, so if f’(0) > 0, then the slope of f(a) is positive at a = 0. 
Therefore, f(a) must be greater than 1 for some value of a in the neighbor- 
hood of a = 0. 

Now f'(0) > 0 for those values of m, and m, for which 


(m, + 2)m./m,(m, — 4) > 2. 


Therefore, 


0) >0 if m=9; m, < 18. 
If we let 


E[(1 + dw*)/(1 + dw)*) 


in equation (4) be equal to g(d), then g(0) = 1 and g’(0) > Oif m, = 9; 
Mm, < 18. We have shown that if either m, or mz is less than 10 and 
greater than 4, then @ is not a uniformly better estimator than z or y. 
While we have assumed that m, and m, in the above proof are each 
greater than 4, a similar proof goes through if either m, or mz is less 
than or equal to four. We will omit this proof. 

We will now show that f is a uniformly better unbiased estimator 
of » if m, and m, are both greater than nine. If we let 


h(v, a) = hv) = (1 + ae’) /(1 + av)? 


then 


hO) =1, = 1/0 +a, — = 1, = 


The minimum value of A(v) is 1/(1 + a) and occurs at » = 1. 

We will approximate the curve h(v) by the parabola u — y = 
5(v — 8)” by forcing it through the point u = 1;v = 0 and putting the 
center of the parabola through the point u = 1/(1 + a),v = 1. Sub- 
stituting these values gives y = 1/(1 + a); 8 = 1;6 = a/(1 + a). 
The equation for the parabola is 

1 a 


@ — 1)’. (5) 


~ 
a. 
ber 
* 
é 


548 BIOMETRICS, DECEMBER 1959 


It can be easily shown that h(v) < wu for all values of v and a satisfying 
0<v< 1. This implics that E[h(v)] < for all a 
in the intervalO < a < 1. But E(u) = + — 4+ 1) = 
1 +a — W)/(a+ 1). But E(u? — 2v) is negative or zero if m, 
and m, are each greater than nine. This then gives us the desired 
inequality E[h(v)] < E(u) < 1 for all a where 0 < a < 1. We can 
show that the same conditions are sufficient for the inequality in (4). 
This completes the proof of the theorem. 

Example 1. Let x, , 22, -**, X», be a random sample from a normal 
population with mean yp and variance o; . Let y; , yo, °*: » yw, be a 
random sample from a normal population with mean » and variance 
o, . The quantities Z, 7, sj , s; , are minimal sufficient statistics for 
u, o; , and where 


(x; — — 1), 
(ys — — 1). 
By Theorem 1 the quantity 


(Nas; + N2gs})/(N 18> + 


is a uniformly better unbiased estimate of u than is ¢ or 7 if and only 
if N, and N, are each greater than 10. 

Example 2. This example will refer to the recovery of inter-block 
information in a balanced incomplete block design. We will use exactly 
the notation used on pages 532 and 537 in [3]. Suppose we want to 
estimate the treatment difference 7; — 7; . The intra-block estimate 
is (Q; — Q;)/rE with variance 2c°/rE. The inter-block estimate of 
tr, — 7; is (T; — T;)/(r — Xd) with variance [2k/(r — \)](o? + ko?). 
The best linear unbiased combined estimate is given in equation (10) 
page 535 of [3] and is equal to 

(Qi Q) rE — — ») 


(rE./20°) + (r — d)/k(o” + koi) 

To use Theorem | we can let € = (Q; — Q;)/rE;9 = (7; — T;)/(r — d); 

=o 3n, = rh /2;0, = o + ko; ne = (r — A)/2k; 8; = intra-block 

mean square; s; = remainder (blocks ignoring treatments) mean square; 

m, = rt —t—b+1;m, = b In this example a{/n, < o3/n2. 

Therefore, the quantity 


(Q; — + — T,)s\]/[krEs; + — d)si] 


is a uniformly better unbiased estimator of 7; — 7; than is either 
(Q; — Q,)/rE or (T; — T;)/(r — X) if either of the following is true: 


2 
S2 


2 
a 
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(i) #-—b—t+1> 18 andb —¢ = 9or (ii) b —t > 10. Soif 
or (ii) is true, then inter-block information should always be used.* 
3. Inaccuracics Due lo Estimating the Weights in the Balanced Incom- 
plete Block. Under the conditions of Theorem 1 let 


= + + 


be the combined estimator of » when o; and «3 are known. We want 
to compare var f@ and var y* to see how the variance is affected by 
using s} and as instead of oj and o;. The quantity 


P = 100 (var @ — var u*)/var p* 


is the percentage error in the variance of the estimator of » from using 
si and s; instead of of and o; . We will find an upper bound on P. 
The quantities , , , M2, m, , M2, are defined in Example 2 of 
the preceeding section. Clearly oj/n, < . Now var = 
< (of/n,)E(u) where u is given by equation (5). There- 
fore, 


2 


100 var p* var 


where a is defined in (1) and where 


L = — 2v) = Me [ + 2)m. 


m, — 2Lm,(m, — 4) 
Also 
(nyo2 + + M205) 
Sou 
100 = = [1 i+ +a)} = ail + J). 
But 
2 
a= = ko? < (r — AAA. 
b 


*Notice that the method used in this paper for combining inter-block and intra-block estimates 
is not the one used by Yates [33]. 
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So 
p< 100 42) 
rl 
and so 


is an upper bound on the percentage error P. The upper bound has 
been calculated for a few designs in the table below: 


A Upper bound on P 


6 3 10 20 4 9.8% 
6 4 10 15 6 9.5% 
10 3 9 30 2 7.8% 


If o; > o’, then the upper bound on P is less than or equal to 
100 (r — AY + L)/At + 
which in the cases cited above are 2.45%, 1.90%, and 1.95% respectively. 
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AN ILLUSTRATION OF THE USE OF STOCHASTIC 
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An ingenious method, called stochastic approximation, has recently 
been devised by mathematical statisticians for a commonly occurring 
problem in biological research (and elsewhere). Briefly stated, stochastic 
approximation is concerned with the regression of a variable y on a 
variable z, and seeks that value x* for which the regression value of 
y is some preassigned number y*. The estimation procedure for 2* 
is sequential and distribution-free. Despite its extreme simplicity in 
application and the wide variety of situations in which it may be useful, 
the technique has not as yet been taken advantage of by empirical 
research workers. One reason for this may be that the existing literature 
is addressed primarily to professional mathematicians (a review of the 
literature is given in[1]). Another reason may be that the mathematical 
theory itself is not yet complete for relatively small samples. 

An empirical research project is reported in the present paper in 
which the technique proved to be highly valuable and economical: 
it enabled us to arrive relatively quickly at a good estimate of the point 
of time at which a certain biological effect was attained. The problem 
was to estimate the time of onset of the action of kinetin on division 
in Paramecium caudatum where increased rates of cell division have 
recently been induced by treatment with this substance [2]. We shall 
explain in detail how stochastic approximation was used to help solve 
this problem. It is hoped that this example may stimulate the use of 
this procedure in similar and other varieties of problems. It is also 
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hoped that mathematical statisticians will perfect the procedure for 
dealing with relatively small samples. 

A beautiful feature of stochastic approximation is the lack of assump- 
tions required. In many problems, like our kinetin one, the researcher 
has no clear picture of the structure of the relationship he wishes to 
study and would prefer not to commit himself, if possible, to hypotheses 
of precise shapes of regression or other distributional features. In such 
a case, he needs a statistical procedure which is distribution-free. 

With respect to the action of kinetin, we were willing to assume no 
more than that the ratio of the number of daily divisions in paramecia 
treated with kinetin, compared to untreated, increased with exposure. 
More precisely, we were willing to assume that the ratio of experimental 
to control divisions (X/C) had a regression on length of exposure that 
was monolonely increasing. Beyond the monotone property, there was 
no basis for supposing in advance whether the regression was linear 
or of any particular curvilinear nature, or whether homoscedascity or 
any particular heteroscedascity would be obtained. We even did not 
know whether to expect normality or non-normality of distribution 
for the deviations from the regression. 

The observations necessary to clarify any of the above properties 
of the regression were prohibitive to us, and actually not essential to 
the main purpose of the research. Use of stochastic approximation 
enabled us to bypass all these problems and to focus directly on the 
points of major interest with a minimum of effort. (As a by-product, 
some data were actually yielded regarding the nature of the unknown 
regression. ) 

Several varieties of techniques for stochastic approximation are 
already available [1]. The original one was developed by Robbins and 
Monro for precisely the kind of regression that concerns us here: mono- 
tone increasing [4]. So it is the Robbins-Monro procedure that is 
specifically illustrated in this paper. 

The object of our first experimental series was to determine how 
long (how many hours) the paramecia would need to be kept in the 
kinetin-containing medium in order to achieve a ratio K/C of 1.10. 
We guessed crudely at 30 hours of treatment as a first approximation 
and experimented accordingly. Ten individual culture slides of para- 
mecia were treated with kinetin for 30 hours, while a control group of 
slides was kept untreated for the same period. The observed ratio of 
divisions was found to be 1.067. This ratio was less than the desired 
one of 1.10; hence for the next trial, hours of treatment were increased 
from 30 to 30.7 according to a formula to be explained below. This 
period gave an experimental ratio too large (1.50); the formula told us 
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to set the third trial at 28.7 hours of treatment. Since the resulting 
K/C was still too large (1.13), the fourth trial cut down experimental 
time even more. 

With the sixth trial, the estimated treatment time required for the 
1.1 ratio was approximately 25 hours and the estimate stayed at about 
this level thereafter. Our conclusion therefore was that 25 hours was 
approximately the time required to give a K/C ratio of 1.1. 

The crux of the procedure is the formula to correct a given estimate 
to obtain the next one in the sequence. Let us now describe in detail 
the exact formula that was used for the successive corrections. Let 

x, = the number of hours of treatment of trial n, 

y, = K/C obtained after a trial of x, hours, 

y* = K/C to be tested (1.1 in this case), 

a, = a weight for trial n (fixed in advance by a simple formula to 

be explained), 
x* = number of hours of treatment for which y* is the expected 
ratio. 

y* and all the a, are all fixed in advance, and hence are well known. 
Each z, and y, become known in turn during the course of the sequential 
procedure. The unknown to be solved for is 2*, and it is to this that the 
x, converge as n increases. 

xy is the first guess of the value 2*, and is chosen arbitrarily. The 
process will converge no matter how bad a guess 2, is, but it may 
converge faster the closer x, is to 2*. For our first experiment, where 
y* = 1.1, we chose 2, to be 30 hours. 

The a, were selected by the formula a, = 20/n(n = 1, 2, 3 ---). 
Thus a, = 20, a, = 20/2, a; = 20/3, ete. The purpose of such weights 
is to allow for large corrections in the beginning of the experiment and 
for corrections of decreasing size as x, approaches 2*. These weights 
satisfy the conditions 


ao 
n=1 


which are necessary and sufficient for the process to converge to 2* [1]. 
Values of x2 , 4, , ele. were then determined sequentially from the 
experimental results according to the formula: 


In our first trial, divisions were counted after 30 hours (x, = 30). 
K/C was then computed and found to be 1.067 (Table 1, line 1). Substi- 
tuting in formula (1), we found 
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x, = 30 + 20(1.1 — 1.067) = 30.66 hours. 


In the succeeding trial, divisions were therefore counted after 30 
hours and 40 minutes; the ratio obtained at that time was K/C = 1.30, 
or y2 = 1.30 (Table 1, line 2). 2; was then computed to be 

= 20 
a, = 30.7 + ry (1.1 — 1.3) = 28.7 hours. 

At each successive stage of experimentation, a new level was similarly 
chosen, based upon the deviation of the previous response from y* and 
on the number of trials already performed (which determined the weight 
of the correction). With the sixth trial (see Table 1), 2, reached a 
value of about 25 hours and thereafter stayed at approximately this 
level. 


TABLE 1 
Srocuastic APPROXIMATION OF Hours or TREATMENT REQUIRED WITII 
1.5 MG/1 KinetIn TO PRopUCE AN ExPEcTED Ratio or Divisions 
Equal To 1.1 (y* = 1.1) 


Hours of | Observed! 

Trial treatment | K/C Weight 
(n) (xn) (Yn) (an) 
1 | 30 | 1.067 20 
4 30.7 | 1.30 10 
3 28.7 1.131 6.67 

4 27.3 1.223 5 
5 26.6 1.577 4 
; 6 24.8 1.133 3.33 
7 24.6 0.89 2.86 
8 25.2 1.00 2.5 
9 25.5 0.81 22 
16 25.6 1.31 2 
11 25.1 1.21 1.82 
12 24.8 1.03 1.66 
13 24.9 — 


I1Each K/C represents the mean numbers of divisions for 10 control and 10 test animals. 


The only practical problem we needed to face, apart from the choice 
of the first guess, x, , was how to select the weights, a, . Robbins and 
Monro [4] had essentially proved that any weights proportional to the 
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sequence 1, 3, 3, i, 4, ete., will yield a converging sequence of estimates 
for x*. How best to choose the constant of proportionality, in the 
absence of any preliminary parametric assumptions about the regression, 
remains an open question. The weight of 20 for our problem was 
chosen for the following reasons. 

We hypothesized that the first estimate of 30 hours would cor- 
respond to a ratio no greater than 1.5 and no less than 1.0. We expected 
it to be an over-estimate, but not by much more than ten hours. Should 
the 30 hour trial yield an observed ratio of about 1.5, the empirical 
difference from the sought for ratio would be about 0.4. A weight of 
20, then, would produce a correction of 20 X 0.4 = 8 hours. Should 
even this correction prove too small, the next correction would still 
be relatively sizeable, being 20/2 = 10 times the next observed difference 
Yo — y*. The weight 20 thus seemed appropriate for bringing the 
experiment | quickly to the approximate neighborhood of 2*. The 
succeeding terms in the series (whose typical term is 1/n) get smaller 
and smaller as n increases (although not too rapidly, since the series 
nevertheless has a divergent sum), so that each individual correction 
tends to be smaller and smaller. The biggest terms in the series are in 
the beginning, and here is where the largest individual corrections may 
be expected to take place. 

A different choice than 20 for the constant of proportionality could 
have either hastened or slowed up the convergence of the experimental 
sequence, but convergence would take place regardless. Thus, the 
decision on the choice of this constant is not of overriding importance. 
Shrewd guesses or preliminary information are useful only in helping 
to shorten the procedure, but are not essential to making it work. 

Notice in Table 1 that the experimentation was stopped at n = 13. 
This was because no appreciable differences appeared among the z, 
from trial 6 onwards. The mathematical statisticians have still not 
provided a rule for stopping at some finite n with an associated confi- 
dence interval for z*. This is an important problem. It is hoped that 
its solution will be forthcoming soon. In any event, we do have what 
seems to be an excellent point estimate of 2* from the given data. 
As a partial check on this estimate one can average the observed values 
of y from n = 6 onwards: this turns out actually to equal 1.1. 

While the focus was on y* = 1.1 in the above sequence of trials, 
information is yielded incidentally about other points in this region. 
Table 1 gives 12 sample points for the regression of y on x, where x 
ranges from 24.6 to 30.7. This affords something of a picture of the 
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nature of the regression in this range. Most of the points, of course, 
have abscissas near z*. But still some information,’ however meager, 
is available about other values of z. 

To illustrate how the Robbins-Monro procedure worked vis-a-vis 
a different ratio of K/C, Table 2 gives the experimental sequence yielded 
for estimating the z* corresponding to y* = 1.05. In designing this 
experiment, we now had available information from the first series to 
help us choose the initial value x, . Accepting 25 hours as the point 
which yields an average K/C ratio of 1.1, clearly fewer hours should 
be needed to obtain the smaller ratio of 1.05. It was decided to start 
the new series at 23 hours. This was expected to be definitely an over- 
estimate of the new 2*. The possible range for the difference between 
the corresponding regression value of y from y* was now known to be 
no greater than 0.05. A larger weight, 30, was chosen—instead of the 
20 of the previous experiments—for the a, , in order to try to reduce 
the hours of treatment as quickly as possible. Should the maximal 
average difference of 0.05 occur in the first trial, then 30 X 0.05 would 
yield a correction of 1.5 hours. Not too many corrections of this magni- 
tude were expected to be necessary. 

After nine trials, the zx, seemed to stabilize. Experimentation was 
stopped at trial 16. Curiously enough, the average value for the last 
six observed y, , from the tenth trial onward, precisely equals 1.05, 
giving heuristic confirmation as to the accuracy of the final estimate 
of 12.5 hours for z*. 

The Robbins-Monro procedure illustrated above is appropriate for 
estimating a point z* at which the unknown regression is strictly mono- 


2A referee has raised the question as to whether data such as in Table 1 can be used to test the 
underlying hypothesis of a monotone regression of y on z. In answer, it should first be stressed that 
one should not use any of the standard tests of significance for trend, in view of the sequential de- 
pendence among the abscissas. Actually, the convergence of the abscissas by itself serves as a test of 
the hypothesis of trend, and apparently a rather sharp one. For if there were no trend in the relation 
of y to z, then the expected value of each yn would equal E(y), whence, from formula (1) above, 


= 21 + — EQ) Q) 


If y* should be chosen to equal E‘y), then equality (2) shows that for each n the expected value of zn 
is the arbitrary initial value z: ; so if convergence is away from 21, there must beatrend. If y* , L(y), 
then the expected value of z, tends to +°o or to —oco, depending on the sign of the brackets in 
the right of (2), in view of the fact that the sum of the aj; diverges. In this latter case, simply having 
the zx converge to some finite value is sufficient to prove the existence of trend. 

Paradoxically, while a faster convergence may be taken as better evidence of trend, at the same 
time it will make the trend less apparent to the eye. This happens in both Tables 1 and 2 of this 
paper. The reason is that the stochastic process is interested only in a point estimate, and wants to 
get there as quickly as possible, without trying to display what is happening in the surrounding interval. 
The reader interested in a more visual impression of our data may compare Tables 1 and 2, for these 
establish two distinct points of the same regression. These tables show that the higher value of y* 
leads to a higher value of z*, or that the trend in our data is in the direction hypothesized. 
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TABLE 2 
Srocuastic APPROXIMATION OF Hours OF TREATMENT REQUIRED WITH 
1.5 mG/1 KINETIN TO PRopUCE AN EXPECTED Ratio oF DIvIsIONS 
Kinetin/Controu Equat To 1.05 (y* = 1.05) 


Hours of Observed 
Trial treatment K/C Weight 
(n) (Xn) (Yn) (an) 
1 23 1.14 30 
2 20.3 1.00 15 
3 21.1 1.75 10 
4 14.1 0.97 7.5 
5 14.7 1.26 6 
6 13.4 0.98 5 
7 13.8 1.04 4.29 
8 13.8 0.96 3.75 
9 13.5 1.25 3.33 
10 12.8 1.22 3 
ll 12.3 1.04 2.73 
12 12.3 LH 2.5 
13 12.2 2.3 
14 11.8 0.97 2.14 
15 12.0 0.81 2 
16 12:5 — — 


tonely increasing: all values of x less than z2* have regression values of 
y less than y*, and all x exceeding x* have regression values exceeding 
y*. It is appropriate also, of course, for a monotonely decreasing re- 
gression. Another procedure has been devised by Kiefer and Wolfowitz 
[3] for a regression that both increases and decreases, attaining a unique 
maximum. Their procedure estimates the position of the maximum, 
and is similarly appropriate for a regression with a unique minimum. 

The kinetin studies are only partly reported above, and actually 
lead to a further problem in stochastic approximation to which we 
wish to call attention. The underlying regression in these experiments 
was hypothesized to be perfectly flatH—neither increasing nor decreasing— 
from zero hours of time up to some unknown point. In this interval, 
the experimental and control groups have identical distributions, and 
so the conditional expected values of K/C are also constant. (This 
expected value of the ratio must be slightly greater than unity, because 
the fact that the expected values of K and of C are equal makes the 
expected value of 1/C greater than the reciprocal of the expected value 
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of A.) Only to the right of this unknown point was the regression 
supposed to be strictly increasing. We would have liked to estimate 
the point at which the regression value of the ratio stopped being con- 
stant. It was the present unavailability of an appropriate technique 
that led to the decision to estimate points somewhat further to the 
right of the basically desired one, namely those corresponding to ratios 
of 1.1 and 1.05. It is hoped that a practical solution can be found for 
estimating stochastically where regression flatness ends, as well as for 
deciding how to stop at some finite n. 

It may be helpful to point out one further practical decision made 
for the above experiments. The biological problem was to ascertain 
at how many hours of treatment a difference began to appear between 
the experimental and control groups with respect to number of cell 
divisions. A priori, one could use the difference between the observed 
division frequencies, their ratio, or many other possible statistics. 
Usually, in deciding which statistic to use, the choice is guided by 
distributional considerations. Such considerations could not help us 
here, in view of our need for a distribution-free procedure. The ratio 
statistic was chosen—rather than the difference—because it enabled 
us to dispense with guessing at the order of magnitude of the absolute 
differences in trying to arrive at the constant of proportionality in the 
a, weights. Deriving and/or computing the distribution of ratios from 
a parent distribution parametrically may often be a forbidding task. 
Such a consideration did not apply here, since the stochastic approxi- 
mation procedure did not need any detailed information about such a 
derived distribution. Were we to proceed more closely to the point 
where the regression stopped being flat, the data underlying Tables 1 
and 2 indicate that we would have to abandon ratios, because very 
often no cell divisions would be observed for the control groups in some 
trial. In such a case, some adjustment would need to be made to 
avoid dividing by zero. But since the above experiments now provide 
some information on the absolute magnitudes involved, one could 
handily switch to the use of differences instead of ratios in further 
experimentation. Such possible flexibility in going from one experiment 
to another is a further feature of stochastic approximation. 
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A MULTIPLE COMPARISON RANK SUM TEST: 
TREATMENTS VERSUS CONTROL* 


Rosert G. D. STEEL 


Mathematics Research Center, U. S. Army 
University of Wisconsin, Madison, Wisconsin, U.S. A. 


SUMMARY 


A multiple comparison rank sum test, for comparing treatments 
with a control in a one-way classification with equal numbers of obser- 
vations, is presented. Both the exact and an approximate distribution 
are discussed. An example and tables of critical values are given. 


1. INTRODUCTION 


Problems of applied research have necessitated the investigation of 
multiple comparison procedures. Such investigations have been carried 
out almost entirely within the framework of the analysis of variance. 

Since the assumptions underlying the analysis of variance are not 
always valid, distribution-free or non-parametric procedures have been 
developed for data arising from a number of experimental designs. 
Most such procedures do not provide for multiple comparisons. 

This paper presents a rank sum multiple comparison test for com- 
paring treatments with a control, when the data are from a one-way 
classification. Error rate is experimentwise. 

An experimentwise error rate is, by definition, the ratio of the 
number of experiments with one or more false significance statements 
to the total number of experiments. Thus, in computing this error 
rate, the experiment is the unit; the experiment which leads to a single 
false significance statement is rated no differently than the one in which 
all comparisons are falsely declared significant. If we set the error 
rate at a, then 1 — a gives the probability that no false statements 
of significance will be made, in other words, that all statements will 
be correct when the null hypothesis is true. In an experiment where 
k independent comparisons are to be made, it is customary to use a 


*This research was done at the Mathematics Research Center, U. S. Army, Madison, Wisconsin, 
under contract number DA-11-022-ORD-2059. 
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per comparison error rate. In this case, the probability that all state- 

ments will be correct is (1 — a)*, when the null hypothesis is true. 

To choose an experimeniwise error rate and a probability level custom- 

arily associated with a per comparison error rate where the comparisons 

are independent is to impose a degree of caution which increases with 

the number of treatments under test. Yet this is what is usually done. 

No alternative is proposed in this paper though some revision in the 

direction of holding caution at a fixed level or even decreasing it as 
the number of comparisons increases would seem justified. 

The proposed test is a non-parametric analogue of Dunnett’s [1] 
procedure. The procedure was developed to meet the needs of those 
research workers whose experiments generally include a recognized stan- 
dard treatment for comparison with each of p treatments; such inclusion 
is required where environmental conditions may change from experiment 
to experiment. The problem is, then, a special case of multiple com- 
parisons, a case which should require a smaller difference for judging 
significance than that required by a procedure which would, for example, 
test all possible pairs of means. The procedure devised by Dunnett 
takes account of the non-orthogonality of the comparisons, uses a 
single value for test purposes, permits computation of joint confidence 
intervals and calls for an experimentwise error rate and a corresponding 
joint confidence coefficient for either one- or two-sided comparisons 
and confidence intervals respectively. 

The proposed test is also a generalization of Wilcoxon’s [5] test, 
applicable when there are equal numbers of observations for all treat- 
ments. Wilcoxon’s test has been extended by Mann and Whitney 
[3] to apply to two treatments with unequal numbers of observations 
but the present paper is not generalized to this extent. Kruskal and 
Wallis [2] have also generalized the use of ranks to apply to data from 
one-way classifications, but they test only the null hypothesis of no 
differences among population means. 

Whitney [4] proposed a test for comparing a control and two treat- 
ments but chose different alternatives. This is the basic difference 
between Whitney’s procedure and that presented here. Whitney’s 
procedure calls for the acceptance of the null hypothesis or of an alter- 
native that requires both treatment distributions to differ from that 
of the control. 

Tor rank tests, the usual null hypothesis is that the observations 
come from identical populations. This is sometimes stated as the null 
hypothesis of no differences among treatments but is not equivalent to 
that of no differences among treatment means unless additional assump- 
tions are made. ‘lhe customary assumption for rank tests is that the 
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variables have continuous cumulative distributions; it has been con- 
jectured that they are fairly insensitive to heterogeneity of variance. 

The usual alternative hypotheses are based on the definition of 
“stochastically larger.” Mann and Whitney [3] describe a continuous 
random variable X, as stochastically larger than another Y, when 
F(a) < G(a) for all a, where F and G are the cumulative distributions 
of X and Y respectively. This definition calls for every percentile of 
the distribution of X to be larger than the corresponding percentile of 
Y’. Thus the test is for differences in location. In particular, we may 
wish to consider the procedure as one that tests for differences in the 
location of medians. If we assume that the distributions differ only in 
location, then the test is also a test of the location of means. 


2. PROCEDURE 


Let X, and X; ,i = 1, --+ , k be random variables measuring some 
characteristic of a control and k treatments. Assume they have contin- 
uous cumulative distribution functions which we denote by F’, and 
F,,i = 1,--:-,k. On the basis of n observations on each variable, 
we wish to test the null hypothesis 


which includes the equality of medians and means, against one of the 
following alternatives requiring that one random variable be stochasti- 
cally larger than another: 


(i) H,:Fo < F;, at least one 7, 
(ii) H,:Fo >VF;, at leastone 


(iii) H,: Fo atleast one 7. 


Alternative (i) calls for the median (as well as the other percentiles) 
of at least one of the treatment distributions to be less than that of the 
control; the treatment distribution is located to the left of the control. 
If the distributions are assumed to differ only in location, then this 
alternative calls for the mean of at least one of the treatment distri- 
butions to be less than that of the control. 

Alternative (ii) calls for at least one of the treatment distributions 
to be located to the right of the control. Alternative (iii) calls for at 
least one of the treatment distributions to be located differently than 
the control. 

The test procedure is as follows. 

1. Rank jointly the X,’s and X,’s, giving rank 1 to the least observa- 
tion. 
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2. Add ranks for one of the variables to give 7, and compute the 
conjugate of 7’, , namely 7% = (2n + 1)n — T,. (The minimum of 
these is ordinarily used.) 

3. Repeat steps 1 and 2 for the remaining X,’s, that is for 
a=2,---,k. 

4. Compare each of the quantities min (7, , 7!) with the appropri- 
ate tabulated value as determined by the desired experimentwise error 
rate and type of critical region, and by k and n. A significance state- 
ment is made for each of the k comparisons. 

In theory, ties do not occur. In practice, they do, but need affect 
the rank sums only when control and treatment observations are tied. 
In this case, they are assigned the average of the ranks the observations 
would have if distinguishable. ‘This makes additional rank sums pos- 
sible so has some effect, presumed small, on tabulated confidence co- 
efficients. 

In applying the procedure against one-sided alternatives, observe 
whether min (7; , 7!) is associated with the control or a treatment. 
Otherwise, a treatment being tested for a response significantly greater 
than control may be declared so when the evidence is for a significantly 
smaller one. 


3. EXAMPLE 


The accompanying data* are Binet IQ scores of 3-year old female, 
white, private patients. Treatments are categories to which the children 
were assigned at birth. It is desired to test which treatments result 
in lowering the IQ below normal; comparisons among treatments are 
not considered to be of special interest. 


Normal Anoxic Rh negative Premature 


103(5, 3, 4) 119(10) 89(2) 92(2) 
111(7, 5, 6) 100(4) 132(11) 114(73) 
136(12, 12,12) 97(3) 86(1) 86(1) 
106(6, 4, 5) 89(2) 114(7) 119(9) 
122(11, 9,10) 112(8) 114(7) 131(11) 


114(9, 7, 73) 86(1) 125(10) 94(3) 


The Normals and Anoxics are ranked from lowest to highest, then 

Normals and Rh negatives, then Normals and Prematures. Ranks 
are shown in brackets. Ties, where they occur, are assigned a mean 
rank for their group. Since alternatives are one-sided and, if true, 


*Data available through courtesy of Dr. Frances Graham, University of Wisconsin School of 
Medicine. 
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tend to lower the rank sums for the treatments, we add only the treat- 
ment ranks; there is no need to find min (7', , 77). The sums are T, = 28, 
T, = 38, and 7; = 33.5 respectively. Fork = 3,n = 6, the 5% tabu- 
lated value is 26. Since the minimum observed 7; is 7, = 28, there 
is not sufficient evidence to conclude that any one of the treatments 
lowers IQ values. 

The same conclusion is drawn if an underlying normal distribution 
is assumed and Dunnett’s procedure applied. The means are 115.3, 
100.5, 110.0, and 106.0 respectively; the observed treatment minus 
control differences are 14.8, 5.3, and 9.3. The residual error mean 
square is 245.8 with 20df. The tabulated value of Dunnett’s ¢ for 
one-sided comparisons with three treatments excluding the control, and 
20df is 2.19 for a joint confidence coefficient of .95. This gives a critical 
value of 2.19*/ 2(245.8)/6 = 19.82 units on the Binet scale. None 
of the differences is statistically significant. 


4. DISTRIBUTION OF MIN {MIN (7;, T%)} 


First, consider the distribution of (7, , --- , 7) where 7’; is the 
sum of the ranks assigned to the X,’s when ranked with the X;,’s. 
We require the number of ways in which (7,, --- , 7.) may be obtained. 


For this purpose, consider the following procedure. Order the 
k + 1 sets of observations jointly by size, placing the smallest observa- 
tion first. Assign the ranks 1, --- , 2n to the n X,’s and n X,’s, ignoring 
all other observations. Repeat for the X,’s and X,’s, and so on, till 
each set of treatment observations has been assigned ranks jointly with 
the control. Compute (7,,--- , 7). If this procedure were repeated 
for all possible permutations of the (k + 1)n observations, then the 
distribution of (7, , --- , T,) would be generated. 

It is clear that ‘any value of (7, , --- , 7) is obtained from more 
than one ordering. For example, if the X,’s occupy positions 1 to n, 
then (7, , --- , T;) = [n(n + 1)/2, --+ , n(n + 1)/2] arises from each 
of the (kn)!/(n!)* permutations of the remaining observations in posi- 
tions (n + 1), ---,(& + 1)n. The problem is, in part, to find a reason- 
able method for determining numbers of permutations for any value 
+++ 

Tor each ordering of any complete set of observations, that is, a 
set of ny) Xo’s, n, X,’s, --+ , m% X,’s, if the last observation is an X, , 
then ranks + , , + m are in 7, , --- , respectively, 
and the rest of 7’; , viz. 7; — nm — n; , must be found by drawing 
m, — 1 ranks from ny — 1 + n; ranks; if the last observation is an 
X, , then the rank n. + n, is not in T,; and 7, must be found by drawing 
nm, ranks from nm) + n; — | ranks. The number of ways in which 


4 
stead 
= 
- 
: 
tes 


MULTIPLE COMPARISON RANK SUM ‘TEST 


(7, , --- , T.) may be obtained is the sum of these possibilities. This 
gives equation (1). 


Wee is the number of ways of obtaining , --- , by adding 
the ranks assigned to the X,’s in each ranking of ny X,’s with n; X,’s, 
- ,k. This formula may be used recursively. If any subscript 
becomes less than }°"_, 7, then the value of that W becomes zero. 
If the value of any superscript becomes zero, then the value of that 
W becomes zero if the corresponding 7; is larger than the possible 
minimum; if it equals the minimum, then the zero and corresponding 
T, are deleted, for example, = because = 15. 
For k treatments and one control with m observations on each, 
the total number of possible arrangements is given by equation (2). 


Iquations (1) and (2) give the distribution of (7, , --- , 7,). Equation 
(2) may be generalized as J = > n;)!/T1,;(n,!). 

From the distribution of (7, , --- , 7), we may obtain the distribu- 
tion of min {7';}. ‘This is likely to be a lengthy procedure when one con- 
siders the number of possible terms, even though there is considerable 
symmetry due to the common value of n. In addition, factorial numbers 
increase rapidly. Some help is available through application of equa- 
tion (3). 


W (at least one 7; = S) = kW(T, = S) — (‘ircr, = T, 


[(k — 1)n]! wee 


2n 


3n 
There are k + 1 n’s and k S’s in the last expression. 
Equation (3) is most useful when S = )>"_, 7 for it then gives us 
a term in the required distribution. For all other values of S, the event, 
for which the probability is computed by equation (3), includes those 
where both S and any other value of 7’,, in particular, smaller values, 
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are T-values. Table 1 was prepared by the procedure described in 
this section and checked by direct combinatorial methods. The former 
method is straightforward but lengthy; the latter method requires 
considerably more concentration and becomes increasingly difficult to 
apply as k increases and as any 7’, becomes larger. 


TABLE 1 
Some Exact PROBABILITIES FOR A RANK Sum TrEst 
(Wr: AND Wz, REPRESENT THE NuMBER OF Ways OF OBTAINING AT LEAST 
T; = T But No 7; < T') 


k=2 k=3 : 
Cumu- Cumu- 
Number lative Number lative 
of Proba- proba- of Proba-  proba- 
ways bility _ bility ways bility _ bility 
n=3 
We: = 148 .0881 Wezz = 43,920 1188 = .1188 
Wiz = 136 0810 =.1690 
W = 1680 W = 369,600 
n=4 
Wuz = 880 .0254 .0519 Wuzz = 2,172,030 .0344 =.0718 
Wiz = 1684 .1005 = 4,539,912 1438 
Wiaz = 2232 
W = 34,650 ~ | ! W = 63,063,000 
n=5 


Wisz = 5754 .0076 .0076 Wiszr = 135,522,072 0116 0116 
Wisz = 5614 .0074 =.0150 Wierz = 118,434,204 .0101 .0216 
Wirz = 10,968 .0145 .0295 Wirz = 227,292,918 0194 = .0410 
Wie = 15,992 .0211 .0506 
Wigs = 55,770 = .0737 .1243 


W = 756,756 W = 11,732,745,024 


5. AN APPROXIMATION 


The parent distribution depends upon k sets of the numbers 1, 
2, --: ,2n. These k sets are not independent since the control treat- 
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ment supplies n of the numbers in cach set. Let us next obtain the 
means, variances, and covariances of the joint distribution of 
(7, , +++, 7;). ‘To do this, it is sufficient to examine the distribution 
for k = 2. This will now be done for the general case of 1, m, and n 
observations on the control and two treatments respectively. 

Yor k = 1, it is easy to show that the mean and variance are given 
by equations (4) and (5) respectively. 


= K(T) = + m + 1)/2 (4) 
oy = E(T’) — = + m + 1)/12. (5) 
Hence, for k = 2, we have the initial conditions, F,....(7;) = 


+ m + 1)/2and Ey — = + m + 1)/12. In addition | 
1) = + 1)/2, = 


Similar initial conditions hold with 7, . 

To show that equations (4) and (5) give the mean and variance 
when k = 2, the sole additional requirement is that equations (4) and 
(5) satisfy the obvious recursion formulas based on equation (1). 

For the mean of 7’, , 


T, l-m,T,-I-n + > 


My 


l,m,n-1 m! 


| DU + m) m+n — 1)! 
2 mp — 1)! min! 


4. + m) E+ m+n — 
2 (m — 


4 ut) min! 


2 l! m!(n — 1)! (l + m + n)! 


= Ul+ m+ 1)/2. 


Similarly, = E(T.) = + n + 1)/2. 
For the variance of 7’, , 
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l! m! n! 
(l+ m+n)! 
{Do (7, — 
+ 21+ m) (Ty — 


II 


E(T%) 


I] 


l! m!n! 


2 


ll 


1, m, n) + 1, m, n) 
(l+m+n-— 1)! 
(l — 1)! m!n! 


(l+m+n-— 1)! 
l!(m — 1)! n! 


+ 2(1 +- i, m, n) + 


+ CAGE m — 1, n) + m — 1,n)] 


+ m,n — 1) + ui(l, m,n — 1)] 
I! m!n! 
l! m!(n — 1)! (lL+m-+n)! 


Im(l + m + 1) 
12 


Similarly, = E(T:) — E*(T.) = +n + 1)/12. 
For the covariance, the initial conditions are 
Ey = + 1)/2}E(7,) = + + m + 1)/4, 
= and = 0. 


The initial conditions and obvious recursion formula are satisfied by 
the relation, 


E(T,T.) = lmn/12 + P(L+ m+ +n 4+ 1)/4. 


= E(T}) — = 


Thus, 
E(T,T) 


l! m!n! 
+ >> +m +n)! 
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{1d (T,; m\(T, -—1l- «. 
+(l+n) (7, — l— 
+ (L+ m) S(T. — 
m)(l+n) 


rl,m-1,n I! m!n! 
Imn , 
+ 4 
Hence = — w,)(T2 — we) = Imn/12. 
Finally p’ is given by equation (6). 
— mn (6) 


[It is interesting to note that, for the normal distribution, p? = mn/ 
+ n).] 


For the joint distribution with n, = n, = --- = n, = Nn, say, equa- 
tions (7) hold: 


n(2n + 1) 


or = E(T’) — = n?(2n + 3)/12 


= = n/(2n +1). 7 


= K(T) 


An obvious approximation to the required distribution is to assume 
that (7, , --- , T,) is from a multivariate normal distribution and that 
(min 7’; — ur)/or is distributed approximately as Dunnett’s [1] ¢ for 
infinite degrees of freedom, simply ignoring the difference between the 
true value of p and the value used by Dunnett, viz. p = 0.5. Tables 2 
and 3 are constructed on this basis by use of equation (8). 


Tabulated T = Integral part of (uy — to,). (8) 


ur and o7 are obtained from equation (7), ¢ from Dunnett’s tables. 
A comparison of Tables 1, 2, and 3 shows the following: 
1. For k = 2, n = 4, Table 2 gives 7’ = 11 as significant at the 
5% point (one-sided alternatives). The exact probability is given in 
Table 1 as .0519. Table 3 gives T = 10 as significant at the 5% point 
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TABLE 2 


SIGNIFICANT VALUES oF Sums: Joint CoNFIDENCE COEFFICIENTS 
OF .95 AND .99 FoR ONE-SIDED ALTERNATIVES 


k = number of treatments (excluding control) 

n 2 3 4 5 6 7 8 9 
4 1l 10 10 10 10 
5 18 17 17 16 16 16 16 15 

6 27 26 25 25 24 24 24 23 

23 22 21 21 -- 

7 37 36 35 35 34 34 33 33 

32 31 30 30 29 29 29 29 

8 49 48 47 46 46 45 45 44 

43 42 41 40 40 40 39 39 

9 63 62 61 60 59 59 58 58 

56 55 54 53 52 §2 51 51 

10 79 77 76 75 74 74 73 72 

71 69 68 67 66 66 65 65 

11 97 95 93 92 91 90 90 89 

87 85 84 83 82 81 81 80 

12 116 114 112 lll 110 109 108 108 

105 103 102 100 99 99 98 98 

13 138 135 133 132 130 129 129 128 

125 123 121 120 119 118 117 117 

14 161 158 155 154 153 152 151 150 

147 144 142 141 140 139 138 137 

15 186 182 180 178 177 176 175 174 

170 167 165 164 162 161 160 160 

16 213 209 206 204 203 201 200 199 

196 192 190 188 187 186 185 184 

17 241 237 234 232 231 229 228 227 

223 219 217 215 213 212 211 210 

18 272 267 264 262 260 259 257 256 

252 248 245 243 241 240 239 238 

19 304 299 296 294 292 290 288 287 

282 278 27. 273 271 270 268 267 

20 339 333 330 327 325 323 322 320 

315 310 307 305 303 301 300 299 
(two-sided alternatives). The exact probability from Table 1 is 


2(.0266) = .0531. 


2. For k = 2,n = 5, Table 2 gives T = 18 and T = 15 as signifi- 
cant at the 5% and 1% points (one-sided alternatives), respectively. 
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TABLE 3 
SIGNIFICANT VALUES OF RANK Sums: JOINT CONFIDENCE COEFFICIENTS 
oF .95 AND .99 For Two-SipEpD ALTERNATIVES 


k = number of treatments (excluding control) 


n 2 3 4 5 6 y é 8 9 
5 16 16 15 15 
6 25 24 23 pa! 22 22 22 21 

21 — — = — 
7 35 33 33 32 32 31 31 30 


The exact probabilities are given in Table 1 as .0506 and .0076. Table 
3 gives T = 16 as significant at the 5% point (two-sided alternatives). 
The exact probability from Table 1 is 2(.0150) = .0300. The exact 
probability from Table 1 for T = 17 is 2(.0295) = .0590. 


| 
a 
8 | 46 45 44 43 43 42 42 4 
41 40 39 38 38 37 37 
9 60 58 57 456 55 £55 54 54 
53 52 51 50 49 49 49 48 & ae 
| 7 73 72 7 69 69 68 
68 66 65 64 63 62 62 62 i 
9 90 88 87 8 85 84 a 
84 82 80 79 #77 7 
12 | 111 108 107 105 104 103 103 102 
101 99 96 95 94 94 93 
13 | 132 129 127 125 124 123 122° 121 
121 116 15 4 12 «112 
14] 154 151 149 147 145 144 144 143 
15 | 179 175 172 171 169 168 167 166 
165 162 159 158 156 155 154 154 
16 | 205 201 196 196 194 193 192 191 ae 
189 186 184 182 180 179 178 177 P| 
17 | 233 228 225 223 221 219 218 217 aa 
216 «212,210 2008S 205 204 208 
18 | 263 258 254 252 250 248 27 246 | 
244 «240 2380 | 
I) | 204 289 285 283 280 279 277 276 
20 | 328 322 318 315 313 311 309 308 f 8 
q 
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3. For k = 3, n = 4, Table 2 gives T = 10 as significant at the 
5% point (one-sided alternatives). The exact probability is given in 
Table 1 as .0373. The exact probability for 7’ = 11 is .0720. 

4. For k = 3, n = 5, Table 2 gives 7 = 17 as significant at the 
5% point (one-sided alternatives). The exact probability is given in 
Table 1 as .0410. Table 3 gives 7 = 16 as significant at the 5% point 
(two-sided alternatives). The exact probability from Table 1 is 
2(.0216) = .0433. The exact probability for 7 = 15 is .0116. 

Thus, for the few values of T for which exact probabilities have 
been computed, the approximate values of T are the same or sufficiently 
close for practical purposes. 
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AN APPROACIL TO THI ANALYSIS OF DATA FOR 
SEMI-QUANTAL RESPONSES IN BIOLOGICAL ASSAY 


J. R. ASHFORD 


Pneumoconiosis Field Research, National Coal Board 
London, England 


1. Introduction and Summary 


In a typical biological assay a population of living organisms is 
exposed to varying quantities of the material under investigation and 
the reaction of the various individual members of the population is 
subsequently correlated with the applied dose. The methods of 
statistical analysis used to handle data of this kind may be divided 
into two broad classes, depending on whether the reaction of the indivi- 
dual organism is expressed in quantitative or quantal terms. In certain 
circumstances it is apparent that the underlying reaction is essentially 
continuous, although owing to difficulties of observation it can only 
be measured in terms of a quantal response. When these conditions 
apply, the observed quantal response may be defined by one or more 
subdivisions of the underlying reaction scale. A single subdivision 
corresponds to a “binomial” response, the subject being classified as 
“responding” or “not responding” according to whether or not the 
underlying reaction is in excess of some specified value. Two or more 
subdivisions correspond to a “multinomial” or ‘‘semi-quantal”’ response. 

The problem of relating an observed quantal response to an under- 
lying quantitative reaction has been considered briefly by Tinney 
[1952] and, in more detail, by Hewlett and Plackett [1956] who show 
that the quantal dosage-response relationship may be derived from 
the corresponding quantitative dosage-response relationship. The ideas 
put forward by Hewlett and Plackett, though plausible, have not, 
however, been verified experimentally in any particular bioassay situa- 
tion. Aitchison and Silvey [1957] describe a method of analysis for 
multiple response data, but these authors do not consider the problem 
from the point of view of an underlying quantitative reaction. 

The purpose of this paper is to examine the application in reverse 
of some of the results given by Hewlett and Plackett, first to determine 
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the effect of errors of measurement of both dosage and reaction on the 
quantal relationship, and secondly to estimate the parameters of the 
underlying quantitative dosage-response relationship from the data of 
a multinomial quantal response. The practical application of the 
proposed methods of analysis is illustrated by an example from the 
field of research into the causes of pneumoconiosis amongst coal miners. 


2. The Corresponding Quantitative and Quantal Relationships 


It will be assumed that there exists an underlying quantitative 
dosage-response relationship of the form, 


= g(a), (1) 


where x and y are respectively monotonic increasing functions of the 
applied dose and the measured reaction, denoted the dosage and 
response metameters. For any specified value of x within the range 
of doses under consideration the corresponding value of y is assumed 
to be normally distributed, with mean g(x) and constant variance o’, 
where g(x) is a monotonic increasing function of x. These properties 
are commonly observed in quantitative dosage-response relationships 
based on the measurement of response on a continuous scale (Finney, 
[1952]). 

From (1), the probability that the response metameter y exceeds 
any specified value, say y, , is given by the expression, 


1 1,2 


P,(z) = 
(2) 


1 {o(z)-wil/o 


If the quantal response (of the binomial type) is defined by y > y, , 
expression (2) describes the relationship between the dosage metameter 
x and the probability of response P,(x). In the terminology used 
by Hewlett and Plackett [1956] the “critical graded response” is thus 
assumed to be constant for each organism. The “normal equivalent 
deviate” Y, of the probability P, is defined by the expression, 


1 
P, exp {—}3¢°} dt, (3) 


and the quantal dosage-response relationship corresponding to (1) is 
therefore of the form, 


Y, = [g(z) — (4) 
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Hquation (4) is equivalent to a result quoted by Hewlett and Plackett 
for the case of a constant critical graded response. 

Thus, if g(x) is a polynomial in x, the quantal response relationship 
is a similar polynomial, in which all the coefficients are divided by o 
and the constant term reduced by an amount y,/c. 

When g(x) is a linear function of x, the corresponding quantal 
response relationship is also linear and the slope of the quantal response 
line is given by the ratio of the slope of the corresponding quantitative 
response line to the standard deviation of the underlying quantitative 
dosage-response relationship; it is therefore independent of the partic- 
ular point y, on the quantitative scale which defines the quantal re- 
ponse. 

It also follows from (4) that the value of the dosage metameter 
x corresponding to any particular probability of response, say Po , 
is given by a solution 2, of the equation, 


gx) =y + (5) 


With the exception of the ED50, which corresponds to Py) = 3} and 
Y, = 0, 2 is a function of o as well as of the parameters contained 
in g(x). As might be expected, the ED50 is the value of the dose cor- 
responding to the point on the quantitative scale which defines the 
quantal response. 


3. The Effect of Errors of Measurement 


When the quantitative reaction is difficult to measure precisely, 
it is likely that the observations y’ of the response y will follow an 
“error” distribution, say ¢(y’, y). If, as is commonly found in practical 
applications, ¢(y’, y) is a normal distribution with mean (y + \) and 
variance p’, where \ and p are independent of y, it follows that, for a 
given x, the observations y’ are also normally distributed, with mean 
[g(x) + d] and variance [o* + p’]. The effect of unbiased and homosce- 
dastic errors of measurement of the response is thus to increase the 
variability of the quantitative dosage-response relationship. From (4) 
the corresponding quantal dosage-response relationship will take the 
form, 


Y, = +A + |’. (6) 


In certain circumstances the dosage metameter may also be subject 
to errors of measurement, which will, in turn, modify the quantitative 
relationship between the observed dosage metameter x’ and the ob- 
served response metameter y’. Under conditions of biological assay 
the values of x’ may be regarded as “controlled” observations (Berkson, 
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{1950]) and we may write x = x’ + é, where 2’ is fixed and é represents 
the error of measurement. If £ is normally distributed, with mean yu 
and variance r’, where u and r are independent of x, then the distribution 
of the true values of the dosage z will also be normal, with mean (z’ + y) 
and the same variance. When there is a linear relationship between 
the true values of the dosage and response metameters of the form, 


E(y) = a + Bx, (7) 


the observed values of y’ of the response metameter will be normally 
distributed, with mean [a + + 8(z’ + and variance [o” + + 
and the quantal dosage-response relationship will take the form, 


Y, = [a+ A(x’ + +A — + + (8) 


Expression (8) represents a generalisation of a result given by Finney 
[1952], who only considered the effect of errors in the dosage metameter. 

If the quantitative response relationship is not linear, the observed 
values y’ of the response metameter will no longer be normally distri- 
buted and the expression (2) will not hold good. 

Thus, when the “true” quantal dosage-response relationship is 
linear and the dosage and response metameters are subject to homo- 
scedastic normally distributed errors of measurement, the corresponding 
“apparent”? dosage-response relationship is also linear, although the 
slope of the regression line is reduced. 


4. Multinomial Responses 


In general the quantal responses most frequently encountered in 
practice are of the binomial type, but there are some applications in 
which the underlying quantitative response scale may be subdivided 
into a number of ordered and mutually exclusive intervals and it is 
necessary to consider a multinomial response. When these conditions 
apply, the underlying quantitative response scale is subdivided at the 
points y = , Y2, Yu-1) , into k classes C, , C, , C, , which 
respectively correspond to the intervals (—@, y,), (y: , Y2), 
(Ya-1) » +). At the beginning of the assay each subject belongs 
to class C, and the corresponding value of the response metameter on 
the underlying scale is —o. From (2) the probability that the re- 
sponse metameter belongs to the class C; after the application of dosage 
z may be written, 


1 


= — P(x) = exp {—327} dt. (9) 


If the test subjects are assigned at random to / groups in such a 
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way that the jth group contains n, subjects, of whom r,, are assigned 
to the class C; when the group is exposed to a dose 2x; , the logarithm 
of the likelihood of any given set of observations may be written, 


& 
L= r:; log Q,(x;), (10) 
i= 
where Q; (z;) is a function of the parameters of the underlying quanti- 
tative response relationship. The maximum likelihood estimates @, 
6, --+ of these parameters may be obtained directly or by the solution 
of the equations, 


ou f= Ou 


The asymptotic variances and covariances of the maximum likelihood 
estimates may be evaluated in the usual way and result in, 


cov (4,6) = | -2(£4) 


: { du av 

Hence the parameters associated with the underlying quantitative 
response may be estimated from the semi-quantal response. It should 
be noted that this statement holds good only if it is known what param- 
eters there are to be estimated and that the method of analysis depends 
on the validity of the relatively large number of assumptions which 
must necessarily be made. 


5. Example 


The practical application of the procedure described in Section 4 
may be illustrated by an example from the field of research into the 
causes of pneumoconiosis amongst coal miners. As part of the National 
Coal Board’s Pneumoconiosis lield Research (Fay, [1957]), a radio- 
logical survey was carried out of the working population at a particular 
colliery and each man was assigned to one of three classes (denoted 
category 0, category 1, and category 2 or more) according to the degree 
of abnormality revealed in his X-ray film. This classification cor- 
responds to an arbitrary subdivision of the continuous scale of abnor- 
mality associated with simple pneumoconiosis into three ordered and 
mutually exclusive classes. It may be assumed a priori that each man 
starts at the lower limit of category 0 on entering the mining industry. 


(12) 
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The process of film reading is subject to errors of observation, which 
may be regarded as normally distributed with zero bias and constant 
variance at all points on the scale of abnormality (Ashford, [1960]). 

During the radiological survey a working history, including details 
of all the occupations followed since leaving school, was obtained for 
each man examined and the population was divided into groups ac- 
cording to the type of dust hazard to which the men concerned had 
been exposed (for the joint effect of different types of exposure see 
Ashford [1958]). For any one of these “exposure groups” a measure 
of the past hazard is provided by the period spent in the particular 
occupation concerned. Checks on the accuracy of the industrial histories 
have shown that they are generally reliable and for the purposes of 
this example they will be regarded as free from error. 

Thus the situation is equivalent to a semi-quantal response ii 
biological assay in which the response measurement is subject to un- 
biased and homoscedastic normally distributed errors of measurement. 
The methods described above may therefore be applied to determine 
the quantitative dosage-response relationship, which from experience 
of many analyses of the type described below may be assumed to be a 
linear function of the logarithm of the period of exposure. The results 
obtained for one particular exposure group are shown in Table 1. The 
periods of exposure quoted in this table represent the midpoints of 
the various class-intervals, although it was not considered worthwhile 
to make any allowance for grouping in the analysis. 


. TABLE 1 


PERIOD OF ExrosuRE AND PREVALENCE OF PNEUMOCONIOSIS AMONGST A GROUP OF 
Coat Miners Woo Have WorkKED PREDOMINANTLY IN THE SAME OCCUPATION 


z 


Period Number of men 
spent 
(years) Category 0 Category 1 Category 2 or more 
Observed | Expected || Observed | Expected || Observed | Expected 
5.8 98 97.5 0.4 0 0.1 
15.0 51/183 | 49.2)181.6 3.6 79.4 174 1.34.1 
5.4 3 2.7 
8.3 8 5.5 
10.6 9 8.7 
8.8 8 8.7 
6.9 10 8.2 
12.0 
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The underlying response metameter may be chosen in such a way 
that the boundaries between categories 0 and 1 and between categories 
1 and 2 are located respectively at the points y = 0 and y = 1. This 
arbitrary choice of boundaries for the response categories does not, 
of course, alter the linearity of the dosage-response relationship. The 
scale of radiological abnormality is thus divided into the three intervals 
(—«, 0), (0, 1), and (1, + ©) which respectively correspond to cate- 
gories 0, 1, and 2 or more, and the graded response is defined accordingly. 

If we substitute g(x) = a + Bx in equation (9), the maximum 
likelihood estimates of the three parameters a, 8, and o@ associated 
with the quantitative dosage-response relationship may be obtained 
by the solution of equations (11), where u = a, 8, and. In this context, 
o represents a summation of the variability of the underlying “truc” 
dosage-response relationship and the variability introduced by errors 
of measurement of the response metameter, and no attempt can be 
made in this analysis to distinguish between these two components. 

As in most applications of this method of analysis there is no direct 
solution of the maximum likelihood equations and it is necessary to 
employ some form of iterative procedure. Initial approximations 4, 
B, and é for the three parameters may be obtained by separate analyses 
of the data in terms of the two types of binomial! response defined by 
category 1 or more (i.e., y > 0) and category 2 or more (i.e., y > 1). 
Reference to equation (4) shows that the corresponding quantal response 
lines are, 


Y, =a,+ = (a + Bx) /é, 
and 
Y, = a, + box = (@ + Br — 1)/e, 

where x denotes the logarithm of the period of exposure. 

Hence ¢ = 1/(a, — az), & = a6, and B = bé = b.¢. Taking 
B = 1(b, + b.)é the initial estimates of a, 8, and « may be derived from 
the values of a, , a, , b; , and b, . The minimum-logit x’ procedure 
(Berkson, [1949]) is to be preferred for this preliminary analysis as 
the calculations involved are considerably easier than those associated 
with the maximum likelihood solution. 

After the calculation of the initial estimates, the Newton-Raphson 
procedure was applied to the data given in Table 1 and the following 
solutions of the maximum likelihood equations were obtained: 


@=-7.52, 8=461, ¢ = 1.56. 


The asymptotic variances and covariances of these estimates are, 
from (12), 
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varé = 1.7517, = 0.6952,  varé = 0.0436, 
cov (@, 8) = —1.0952, cov (@, é) = —0.1940, cov (8, 6) = 0.1123. 


The quantal response lines relating to category 1 or more and cate- 
gory 2 or more pneumoconiosis are, 
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Y, = —4.82 + 2.962 


and 
Y, = —5.46 + 2.962. 
The corresponding quantitative response line is 
= —7.52 + 4.61z. 


‘The corresponding quantal and quantitative response lines are shown 
in Figures 1 and 2 respectively. It should be noted that the fact that 
the quantal response lines are parallel means that better estimates of 
these lines may be derived from this method of analysis than would be 
obtained by a consideration of each response separately. 

The fit of the observations to the hypothesis on which the analysis 
is based may be examined in Table 1, which gives details of both the 
“observed” and the “expected” numbers of men showing evidence of 
the various types of response. The agreement between the two sets 
of figures is good and there is no evidence of any significant difference 
between them. The value of x’, after grouping the first three and last 
two entries on = table, is 4.62, which for 7 degrees of freedom is not 
significant (p ~ 0.7). 

Similar analyece to that described sia have proved useful for 
dealing with the data from other exposure groups which include a 
sufficient number of men with category 2 or more pneumoconiosis. 
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USE OF CONTINGENCY TABLES IN THE ANALYSIS 
OF CONSUMER PREFERENCE STUDIES 


R. L. ANDERSON 


Institute of Statistics, North Carolina State College 
Raleigh, North Carolina, U.S. A. 


1. Introduction 


A consumer preference study involving three varieties of snap beans 
was conducted at Mississippi State College. One lot of each of the 
three varieties (V, , V. , and V;) was displayed in retail stores, and 
each of m consumers was asked to rank the beans according to first, 
second, and third choices. The data obtained in one store on one day 
are presented in Table 1. I was asked if the usual x’-test with four 
degrees of freedom could be used to test for independence of varieties 
and ranks, i.e., that each variety had the same chance (3) of receiving 
a given rank, regardless of rank. 


TABLE 1 
CoNSUMER RANKINGS OF THREE VARIETIES OF SNAP BEANS 
Rank 
Variety Total 
1 2 3 
Vi 42 64 17 123 
V2 31 16 7 123 
V3 50 43 30 123 
Total 123 123 123 369 


This is not the usual problem of a contingency table with fixed 
border totals, because repeated sampling is not a random rearrangement 
of rn items, subject to the border restrictions, where for the sake of 
generality, we will assume there are r varieties. 

Instead, each of the n consumers acts independently, but has only 
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r! possible preference sequences. For r = 3, these are Vive, 
V2ViV3 , VoV3Vi , V3ViV2, and V;V.V,. In other words, each con- 
sumer must examine all r varieties and use all r ranks. Since every 
consumer is represented in each column (rank), one could make a test 
of varietal differences at each rank. Suppose a separate Q; is computed 
for each rank, where 


QG=r (n;; — n/r)’/n, 


n,; being the number of times variety 7 was given the rank 7. Under 
the null hypothesis that all varieties were equally preferred in the 
population, Q} would be asymptotically distributed as x’ with (r — 1) 
degrees of freedom. The over-all Q’ is simply 

7=1 
Q’ does not approach a x” with r(r — 1) degrees of freedom, because 
the Q? are not independent. However, the expected value of Q° is 
r(r — 1). If Q’ is actually distributed (asymptotically) as ky” with 
(r — 1)* degrees of freedom, it appears that k = r/(r — 1). Hence 
one could use as an over-all test statistic, (r — 1)Q°/r, which would 
have an asymptotic x°-distribution with the usual (r — 1)’ degrees of 
freedom. A formal proof of the justification for this procedure follows. 


2. Asymptotic Distribution of Q? 


Let the number of consumers selecting the Ath preference sequence 
(out of r!) be y, , where >>%'!, y, = n. The probability distribution 
for the {y,} is the usual multinomial, 


n! 
TT 


where p, is the probability of selecting the Ath preference sequence. 
Under the null hypothesis, p, = 1/r! = p and E(y,) = np, Var (y,) = 
np(1 — p), and Cov , ys) = —np”. 

The number of times (n,;) variety 7 is given the rank j will be the 
sum of (r — 1)! of the y,. Hence, since (r — 1)! = 1/rp, E(n,;) 
n/r and 


Var (n,,) = =P) _ = mp) _ = 1), 
rp 
Let = (ni; — n/r)/Vnlr — 1)/7’, so that @ = — 1) 22, /r. 


Irom the above, we see that x,;; is asymptotically normally distributed 
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with zero mean and unit variance, N(0, 1). One can obtain the same 
result by regarding a given n,; as being binomially distributed with 
Pi; = 1/r, so that Var (n,;) = n(r — 1)/r’. 

The {z,;} are correlated but not according to a multinomial distri- 
bution. Let us designate the (r’ X r’) correlation matrix as R(r’ X 1’). 
Let a; = and b; = >>. Since >; = Dos ny =n, a; = 
b; = 0. The correlation between x;; and z;, will be the same for all z 
and for all j ¥ ¢; let this be r, . Similarly for z,; and z,;(¢ # s). From 
the definition of the y’s (from which the z’s are derived), it is obvious 
that varieties and rankings are interchangeable. The correlation 
between z,; and z,, should be the same for all i ¥ s and j # #; let this 
> be r,. We have shown that Var (z;;) = 1. Since a; and 6; are always 
f zero, they have zero covariances. Hence 


E(a,a, E(b;b.) r(r 1)r. =0 
E(a,b) = 1 + 2¢ — I), + — 1)" = 0. 


a The solution to these equations is r, = —1/(r — 1) and rz = 
1/(r — 1)? = r?. To check these results, we note that a; and 5; also 
have zero variances, i.e., Var (a;) = Var (b;) = r+ r(r — 1)r, = 0. 
This equation is satisfied by r, = —1/(r — 1). Forour3 X 3 example, 
r, = andr, = 
The correlation matrix can be written as follows: 


R, nR, | 
| net x)= 


mR, R, 


where 


Ri Xn = 


| 
One can obtain these results by use of the variances and covariances 
of the y’s. Since n,; is the sum of the (1/rp) y’s with V; in the jth 

position, n;; and n,,(j #) will have no y’s in common. Hence 


° 
, 
» 
° 
. . 
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Cov (n,; = = 


Cov (ay; , 2%) = —1/(r — 1). 
Similarly for n;; and n,;(i s). For n,; and n,,(i s, ¥ 2), there 


will be (r — 2)! = 1/r(r — 1)p y’sin common. Ience 


mpl — p) _ | | 
rr — Dp" Lop? r@— ipl 
Cov (2;; , 2) = 1/(r — 1)’. 


From these results, we conclude that Q’ involves the sum of squares 
of r correlated variates, which are asymptotically normally distributed. 
let X’ be the row vector of the 2’s, i.e., 


Cov (n;; ,”..) = 


XxX’ = , 212, °°? 9 Vip 5 U22, 


Hence rQ’/(r — 1) = X’X. Suppose we make a linear transformation, 
X = HZ, such that H’H = R; then X’X = Z’(H’/H)Z = Z’RZ = 
Dru where the ; are the latent roots of R. If E(ZZ’) = I, the u,; 
will be asymptotically distributed as independent y*-variates, each 
with one degree of freedom. In this case, 


H’H = R = F(XX’) = HE(ZZ’)H’ = HH’. 


This means that H must be a symmetrical matrix. But it is certainly 
possible to find such a transformation matrix, since R is symmetrical. 
It is not necessary to determine the elements of H; all that is needed 
here is to show that a suitable H can be found to make E(ZZ’) = I. 
Ifence our problem is now reduced to that of determining the latent 
roots of R. 

R can be written as the direct product R, X R,. The latent roots 
of R are simply the squares of the latent roots of R, . Since R, is a 
simple circulant matrix, it has one root \, = 1 + (r — 1)r, = O and 
(r — 1) roots \, = (1 — r,) = r/(r — 1). Hence (r — 1)’ roots of R 
are [r/(r — 1)} and the other (2r — 1) roots are zero. Therefore 
>\,u; is asymptotically distributed as r°x’/(r — 1)*, where x” has 
(r — 1)* degrees of freedom. This confirms our original supposition, 
that (r — 1)Q’/r has a limiting x’-distribution. For the data in Table 
1, (r — 1)Q3/r = 2(79.56)/3 = 53.04, with 4 degrees of freedom, where 
Q) is the particular value of Q’ for this sample. 

V. J. Bofinger investigated the distribution of Q’ for the three 
variety case (r = 3) with n = 30 consumers. One thousand 3 X 3 tables 
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were derived by random sampling, each table having a total of 30 
judgments, and Q* computed for each. Each judgment contributed 
a one to each row and column of the table. The sampling procedure 
was based on a random selection of one of the six preference sequences 
by use of a random uniform deviate, w. In this case, y, would be the 
number of observations for which w fell between (h — 1)/6 and h/6. 
In the computations, § was replaced by .1666666666, 3 by .3333333333, 
.++, 1 by .9999999999. After determining the {y,}, where >>, y. = 30, 
the corresponding n;; were determined and Q’ computed. In the first 
run of one thousand 3 X 3 tables, the average value of Q? was 5.888, 
its variance 16.2, A; = 91.6, and @, = 1605. On a second run of one 
thousand 3 X 3 tables, the average value of Q’ was 6.020 and its vari- 
ance 17.5. If Q’ is 3x’/2, where x’ has four degrees of freedom, E(Q’) = 
6, Var (Q’) = 18, u;(Q*) = 108, and »,(Q*) = 1944. The similarity 
is quite striking. 


3. Alternative Analyses 


3.1 Test of independence of first and second rankings 


There are many other methods of analyzing consumer preference 
data of this kind. For example, one might desire a test of independence 
of first and second rankings (the third ranking is fixed once the first 
two are known). For this test one needs the frequencies (y,) of the 
six preference sequences. Suppose we let y,. be the frequency of the 
sequence V,V.V3 , for ViV3Vs , , Ys2 for V3V2V, , with true 
probabilities , Pis » Ps2- As indicated above, the {y,} follow 
a multinomial distribution. The likelihood test of independence of 
first and second rankings is found by estimating the {p,} subject to 
the usual restriction >> p, = 1 and the null restriction that p,2p.sPs1 = 
PisPs2P2. - In order to make the test, one must first solve the cubic 
equation in A: 


(yr2 — A)(Yos — — A) = (Yrs + + + A). 
Once \ has been obtained, under the null hypothesis the quantity 


Xi =n (1/m) 


is asymptotically distributed as x” with one degree of freedom, where 
m, = Y, — Aor y, + A, as in the equation. 

A somewhat simpler test of independence of first and second rankings 
might be to use the null restriction that p,. + pos + Psi = Pis + Ps2 + 
Po. This results in the testing statistic 


2 
i 
> 
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xf = (Yi2 + + Fis Yor)’ /n, 


which also has a limiting x’-distribution with one degree of freedom. 

One could also test the general hypothesis that each p, is equal 
to}. The testing statistic would be X2 = 6 >>, (y, — n/6)*/n, which 
has a limiting x’-distribution with five degrees of freedom. It is in- 
teresting to note that 


Xi = + 20°/3. 
3.2 Construction of orthogonal single-degree-of-freedom contrasts 


It is possible to construct simple contrasts, each involving one- 
degree-of freedom x’ tests. For the 3 X 3 table, the following are 
suggested: 


Constrast Ny 


1 
0 
1 
0 


The contrasts are 1, = (m2; — M2;), = (Mi, — Miz — + ete. 
Approximate x’-statistics, under the null hypothesis, each with one 
degree of freedom, are given by 3/{/2n, and 3/{/2n. 1, 
can be used to test the null hypothesis of no “linear ranking effect’’ 
for V,. 1, tests whether the linear ranking effect is the same for V, 
and V;. J; and J, are “quadratic ranking effects.” It is easy to show 
that 31, is equivalent to — 2,2. + — M3; + — N33), because 
+ Mie + M3) — (M31 + + M33) = O. Var (J; = Var = 
n(4 + 2)/9 = 2n/3; Var (lz) = n(8/9 + 8/9 + 4/18) = 2n; Var (/,) = 
n(12 + 6)/9 = 2n. 1, is also a varietal contrast. 

The sum of the first two x’-values is the Friedman [1937] rank sum 
statistic, 


where r; is the rank total for the 7th variety. A formal proof of this 
equality is as follows: 


587 
4 | 
| 
Coefficient of n;; in constrast | ie 
Ne2 N23 N31 N32 33 \ ar 
0 0 0 O 0 2n/3 
le 1 0 -1 0 0 -1 0 1 2n ia 
ls 0 0 0 -2 #1 0 2n 
l, 0 0 0 0 41 0 2n/3 
| 


BIOMETRICS, DECEMBER 1959 


= 3(%23 — No)" + (m1 — m3 — + N33) 


2n 
(Mes — Ma)” + — Ms)” + (Mar — 
n 
+ (no — = Ms +s, — N33)" 
nm 
n n 


The proof rests on the fact that n,, + m3, — m3 — M33 = N23 — Na. 
These results could be extended to r > 3. 

For the data in Table 1, the x{-values for /, , 1, , 1; , and J, , respec- 
tively, are 24.70, 0.10, 22.86, and 5.38. 

The selection of the above contrasts was entirely arbitrary. How- 
ever, in many practical situations, the experimenter can select com- 
parisons of interest to him before the experiment is started. As a 
matter of fact, one of the prime duties of a statistician should be to 
encourage experimenters to indicate which comparisons should be made. 
Even if the experimenter is unable to make such a decision before he 
sees the data, one should not be deterred from examining the results 
to see if some comparisons seem to produce striking results. The 
experimenter should then want to determine if there is some scientific 
reason for these results. One can use some of the current methods of 
making tests for multiple comparisons in assigning probabilities to 
these results or in making confidence statements. At any rate, the 
important thing is to reduce x’-tests to those involving single degrees 
of freedom (squares of normal deviates). 

Since the experimenter is generally most interested in varietal 
contrasts, especially at the highest ranking, a more informative set 
of orthogonal contrasts might be 


— , — 2nn +m , 
ly = M2 — M3 — + Naz , = — 

It may be more informative to consider contrasts based on the 
y’s instead of the n’s. y-contrasts are easier to analyze because the 
y’s are equally correlated, under the null hypothesis. The y-contrasts 
corresponding to the above n-contrasts are 

= Yis + Ysi — Yor — Yos 
lL, = + Yis + Yar — Yrs — Yar — 2Ys2 
ls = You + Yos — 2Yi2 — + Yis + Yas 
Yis + Yos — — Yor - 
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These four contrasts are orthogonal to the contrast involved in X?%?: 


ls = Yio + Yos + Yar — Yrs — Ysx — Yo - 


H. O. Lancaster in reviewing the manuscript has shown how to 
derive the results in Section 2 by use of these orthogonal contrasts. 
A simple method of constructing these orthogonal sets is to construct 
an orthogonal set for varieties, based on an r X r coefficient matrix, 
say A, and another orthogonal set for rankings, also based on an r X r 
coefficient matrix, say B. Then the coefficient matrix of the vector of 
the n,;; , N(r’ X 1) is simply the direct product A X B. The first 
rows of A and B will be (1, 1, --- , 1). Hence the sums of all other 
rows must be zero. This procedure is often used to construct orthogonal 
treatment contrasts for factorial experiments. 

In conclusion, it should be mentioned that Friedman [1940] gives 
exact 1% and 5% significance levels of S for r = 3(1)7 and n = 3(1)6, 
8, 10, 15, and 20, plus other results for r = 3. Kendall [1955] gives 
exact probabilities of obtaining given values of S under the null hypothe- 
sis for r = 3, n = 2(1)10;r = 4, n = 2(1)6. Kendall advances a coeffi- 
cient of concordance, W = 12S/n’r(r? — 1),0 < W <1. 


3.3 A likelthood procedure for rank analysis of triple comparisons 


Pendergrass and Bradley [1960] have developed a likelihood pro- 
cedure for rank analysis of triple comparisons. Let 7, , 7, , and 7 
be scale parameters, such that >>; 7; = 1. 1;. is the rank given the 
ith variety by the cth consumer, (r;, = 1, 2, or 3). Then the probability 
that (ri. < Tie < Tre) iS wit; /A;;, , Where 


Asin = wile; + + + + + 7,). | 
The ith maximum likelihood equation is an 


2p.(p; + ps) + pi + 
an D > 


where #; = p; , D = A, a; = 3n — >. Tic, and n = 123. A first 
approximation for the solution of these equations is obtained from 


pi(6a; — 12n) — 4np; + 2a; = 0. 


For the data in Table 1, a, = 148, a, = 78, anda; = 143; approximate 
values of the p’s are: p, = .398, p. = .215, and p; = .387. For large 
samples, when 7; = 3, 


T = 2nIn6 + (a; Inp,) — 2nInD 


| 
| 
| 
Peta 
= 
Rat 
e a 


590 BIOMETRICS, DECEMBER 1959 


is distributed as x’ with two degrees of freedom. For the data in Table 1, 
T, = 25.4. We note that this test gives results almost identical to 
those obtained using Friedman’s test. 


3.4 Use of scores 


Another experimental procedure would be to use a scoring system, 
in which each consumer assigns a score to each variety. The scores 
could range from 1 to 5, 1 to 10, etc. Of course ties would be allowed. 
The variation among the resultant scores could then be summarized 
in an analysis of variance table, as follows: 


Source of Degrees of Mean 
Variation Freedom Square 
Consumers n—1 MSC 
Varieties r—l=2 MSV 
Error (r — 1)(n — 1) = An — 1) MSE 


The hypothesis of no differences among the varieties (H,) could be 
tested by use of F’ = MSV/MSE, which is approximately distributed 
as F (under H,) with 2 and 2(n — 1) degrees of freedom. Many studies 
have shown that non-normality is not a serious problem here. 
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A UNIFIED THEORY FOR QUANTAL RESPONSES TO 
MIXTURES OF DRUGS: NON-INTERACTIVE ACTION 


P.S. HEWLETT 
Pest Infestation Laboratory, Slough, England 
AND 


R. L. PLAcKEttT 


Department of Applied Mathematics, Liverpool University 
Liverpool, England 


1. INTRODUCTION 


In 1952 we considered how the quantal responses in groups of 
organisms can be expressed as functions of the doses of two poisons 
administered together, and what we wrote applies to drugs in general. 
Our aim was a general theory for the inierpretation of data of this 
kind, a theory properly related to the underlying mechanisms of joint 
drug action. So far as we know, advance in this field has since been 
confined to Ashford’s [1958] alternative method of deriving an equation 
for simple similar action. Sampford [1952] made progress in the allied 
field of response-time distributions for drug mixtures. Despite criti- 
cisms (see discussion) we maintain that our approach to the problem 
was sound, though now think that it was not sustained adequately 
because, as explained below, the results lacked unity. This paper is 
the first of a series in which a unified theory is presented. Many of 
the equations obtained before remain, but as special cases of new ones; 
the latter clear away difficulties, and should enable a wider variety 
of data to be interpreted. 

The biological mechanism of joint action can vary according to 
the pair of drugs, and the relation between the quantal response and 
the jointly applied doses differs according to this mechanism. Thus 
in our previous paper (Plackett and Hewlett [1952]) we set up biolog- 
ical models of joint action, and deduced the relations, i.e. mathematical 
models, from them. The set of biological models sprang from a two- 
way classification. The joint action was defined as similar or dissimilar 
according as the sites of primary action in the organism were the same 
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or different, and as interactive or non-interactive according as one drug 
influenced or did not influence the biological action of the other. This 
table shows our terms for the resultant four types of joint action: 


Similar Dissimilar 
Non-interactive Simple similar Independent 
Interactive Complex similar | Dependent 


In dissimilar joint action the actions of the respective drugs might 
in fact be separate in time rather than (or as well as) separate bio- 
chemically, an idea for which there is experimental evidence (Turner 
and Bliss [1953], Hewlett unpublished). Of non-interaction Ashford 
[1958] states, ‘This means that the relative change in the probability 
of response (and thus in the equivalent deviation) associated with a 


small change in the dose of any one poison is independent of the dose - 


of any of the other poisons.’”’ This mathematical interpretation happens 
to be true for simple similar action, but is not true for independent 
action in general. We shall use the terms non-interaction and inter- 
action in the biological sense in which we used them before. 

The theory we developed was cleft by a rigid separation of similar 
and dissimilar joint actions. The mathematical model proposed for 
complex similar action was a generalization of that for simple similar, 
and that for dependent action a generalization of that for independent. 
However, the models for similar and dissimilar actions allowed no 
intermediate forms, i.e. partially similar action, but biological experience 
suggests that the action of certain drugs may be partially similar. 
The fact that each kind of interactive action in the table above could 
be regarded as a generalization of the corresponding kind of non- 
interactive action shows that unification of the models for simple 
similar and independent action will unify the whole theory. This 
paper considers non-interactive action afresh; a preliminary publication 
(Hewlett and Plackett [1957]) outlined some of the results. Later 
papers will consider interactive action and give worked examples of 
the application of the theory to experimental data. Although for 
convenience of presentation we consider the joint action of two drugs 
in terms of normal equivalent deviations (N.E.D.’s), the generalization 
of the equations to several drugs is obvious, and the principles for 
use of other types of equivalent deviation will be clear. , 

The procedure here is to infer the conditions for non-response in 
the individual organism for every form of non-interactive joint action. 
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(in 1952 we founded the equation for independent action on a condition 
for individual non-response, but that for simple similar action on the 
response of a group of organisms as a whole.) In treating non-interactive 
action in general, the conditions for individual non-response are, as 
it were, superimposed upon a bivariate distribution of tolerances for 
the two drugs. This leads to consistent mathematical models that 
permit any degree of correlation of tolerances with any degree of simi- 
larity between the biochemical or physiological actions of the two 
drugs. In particular, these models permit incomplete correlation of 
tolerances for two drugs acting similarly, whereas previous models 
(Bliss [1939], Finney [1952a], Plackett and Hewlett [1952]) permitted 
only their complete positive correlation in this type of joint action. 
Similar action with incomplete correlation of tolerances may at first 
seem contradictory, but we show later that it is likely to occur in fact. 

The normal equivalent deviation (or probit, logit, etc.) of the 
quantal response to a drug applied alone can generally be taken as 
linear in log-dose, and if two drugs are very similar chemically their 
respective N.E.D.-log-dose lines are usually parallel, or not significantly 
otherwise. A belief has grown up that if two drugs act similarly these 
lines must (apart from sampling errors) be parallel (Trevan [1927], 
Bliss [1939, 1954], Dimond e¢ al. [1941], Horsfall [1945]). This belief 
has been fostered partly by failure to develop a consistent mathematical 
model for similar joint action that permitted non-parallel lines for the 
separate applications (Bliss [1939], l’'inney [1952a]). In 1952 we quoted 
examples of insecticides closely related chemically, but with non- 
parallel N.E.D.-log-dose lines. We suggested that the differences arose 
from differences in the mode of transmission from the site of dosage 
to a common site of action; but in developing a model of similar joint 
action for such drugs we were unable to avoid the hypothesis of an 
underlying parallelism, namely, in the lines relating N.E.D. to the 
logarithms of the amounts of drug acting. Finney [1952b] questioned 
whether differences in transmission to a common site of action provided 
a satisfactory explanation for the differences in slope. Turner [1955] 
has since published data for quantal responses to certain chemically 
similar insecticides, for which the differences in slope were so large 
as to seem inexplicable by differences in transmission. 

The approach adopted in this paper gives a mathematical model 
for simple similar action which, irrespective of the degree of correlation 
of tolerances, does not restrict the relative values for the slopes of the 
lines relating the N.E.D. of response to the logarithm of the dose or 
of any dependent quantity such as the amount of drug reaching its 
site of action. A later paper will show that this model, with an assump- 
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tion of complete correlation of tolerances, appears to account for Turner’s 
data for the joint toxicities of the insecticides giving the non-parallel 
lines when applied singly. 

As a preliminary to the main part of this paper we have now to 
discuss two aspects of the responses to drugs applied separately, namely, 
the general mechanism of drug action and the correlation of tolerances. 


2. THE ACTION OF A SINGLE DRUG 


The study of joint drug action requires a general biological picture 
of the way in which one drug applied singly produces a response in 
an individual organism. We adopt here a concept of drug action based 
on Veldstra’s [1956], with one modification. When a dose of a drug 
is introduced into the complex of systems forming an organism, only 
a part of the dose reaches the site of action. This part produces the 
biochemical and physiological changes which, if great enough, lead to 
the particular quantal response under consideration. The remainder 
goes to what Veldstra terms “‘sites of loss.’”’ Thus, of the part of the 
dose not acting some may remain stored in the tissues, without con- 
tributing to the action; some may be metabolized enzymically to a 
less active, or inactive substance; and some may be excreted either 
unchanged or as metabolites. When metabolized a drug is usually 
transformed to a substance less biologically active, but examples are 
known in which the reverse occurs, i.e. the metabolite is more active 
than the drug itself, an eventuality not included in Veldstra’s scheme. 
Indeed some drugs if unmodified are biologically inactive, and owe 
their apparent activity to that of their metabolites. A given metabolic 
modification may be highly specific to the drug, i.e. whether it occurs 
may depend on the detailed molecular structure of the drug, as may 
its activity. Tissue storage and excretion are generally less specific. 

The usual assumption is made here that an individual organism 
shows a quantal response to a drug if the dose exceeds its tolerance, 
a quantity characteristic of the individual at the time at which the 
drug acts on it (for a recent discussion, see Hewlett and Plackett [1956)). 
The tolerance for one drug varies from one individual to another in 
a population of organisms, and evidently a number of factors determine 
the tolerance of an individual. Presumably for the response to occur 
a certain minimum amount of drug (or active metabolite) must reach 
the site of action in the individual organism, a minimum which is 
characteristic of the individual, and will be called the action tolerance; 
but the tolerance (which is in terms of dose) will be determined in 
addition by the capacities of the sites of loss, and of active metabolite 
production, if any. ‘These capacities doubtless vary according to the 
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individual, and cannot be assumed necessarily to be highly correlated. 
lor example, if one individual can excrete more of a drug than another 
can, the first might none the less be capable of metabolizing less than 
is the second. 

We shall assume that for the individual organism the expression 


w= pe" (u, > 0) (1) 


relates with sufficient accuracy the amount of drug acting, w, to the 
dose, 2, over adequate ranges. w cannot exceed z, and for (1) w < z 
when z > = if0 <n < 1,2 < aif > 1. Previously [1952] 
we laid stress on the possibility that » (which might be called a “‘pene- 
tration parameter’’) is not necessarily the same for one drug as for 
another. The purpose of this was to permit differences between the 
slopes of the respective N.E.D.-log-dose lines for drugs acting similarly, 
for it then appeared that the respective lines relating N.E.D. to log 
w were necessarily parallel for such drugs. As already explained, this 
parallelism is no longer a necessary assumption. Possibly 7» = 1 will 
often be an adequate approximation, i.e. in the individual organism 
w can often be taken as directly proportional to z. However, in a 
later paper we shall examine data for which there is good reason to 
believe that » < 1. 

Owing to variation from one individual to another in the total 
capacity of the sites of loss, 1 and 7 may vary within a population of 
organisms. The theory to be put forward allows for variation in yp 
(which introduces no complication), but not for variation in y. Should 
variation in » be shown to be important, the theory could be modified 
to accommodate it, but until this is shown, taking 7 as fixed seems 
to be a reasonable working assumption. Suppose the N.E.D. of re- 
sponse*, x, to be given by 


zx=e+ logw (2) 
if w were exactly controlled. Suppose also that 
logz. (3) 


A distinction has to be made between 6’ and 6 = B/». 6’ = @ only 
if the value of z exactly determines the value of w, i.e. if u does not 
vary with the individual organism. If, for example, log » was normally 
distributed with variance y’, and was uncorrelated with the action 


*Whether the symbol z or y is used for N.E.D., the convention that y and z represent dependent 
and independent variables respectively cannot be adhered to, because N.E.D. are dependent variables 
in some equations and independent in others. ‘The use of 1 for N.E.D. may help to avoid their con- 
fusion with probits, usually represented by y. 


ay 
gt 
C. 
: 
m 
“4 
ne 
) 
ne 
ur 
18 
ce; 
lit ck 
| 
\ 
L 


596 BIOMETRICS, DECEMBER 1959 


tolerance, considerations mentioned by Finney [1952a, p. 170] show 


that 
6 = + (4) 


If, as usual, the values of z, but not of w, are known, it is the value of 8, 
and not of 6’, that is important in formulating the responses to jointly 
applied doses of drugs (cf. Plackett and Hewlett [1952]). In a pre- 
liminary publication (Hewlett and Plackett enti the quantity defined 
here as 6’ was referred to as 0. 


3. CORRELATION OF TOLERANCES 


Each organism in a population is assumed to have a tolerance for 
each of two drugs applied separately, giving rise to a bivariate distri- 
bution of tolerances, with a greater or lesser degree of correlation between 
the tolerances for the respective drugs. If the two drugs are very 
similar chemically, they may act in the same way at the same site of 
action, and be stored, excreted, and metabolized in the same way; 
under these circumstances the correlation of tolerances could be expected 
to be completely positive, or nearly so. However, two drugs could 
be similar chemically in such respects that they had the same site of 
action, but could differ in such a way that one, but not the other, was 
metabolized to an inactive substance at a site of loss; the tolerance 
would then depend partly on the action tolerance and partly on the 
capacity of this site of loss, and so the correlation of tolerances could 
then be incomplete, despite identical] sites of action. If two drugs had 
different sites of action and different sites of loss, the correlation might 
be low. On the other hand, if the sites of action were different, but the 
tolerances depended largely on the capacities of non-specific sites of 
loss, the correlation of tolerances could be strongly positive. 

It is even possible to envisage a mechanism for the negative cor- 
relation of tolerances. This would be expected if the same enzyme 
system could promote the metabolism of one drug to a metabolite 
more active than this drug, and the metabolism of another drug to a 
metabolite less active than this second drug—assuming, of course, that 
the tolerances for both drugs depended predominantly on the capacity 
of the enzyme system concerned. The sites of action of the drugs 
might be the same or different. In considering independent action, 
Plackett and Hewlett [1948] allowed for the possibility of negative 
correlation of tolerances, but at the time were unable to suggest a 
mechanism for it, and the concept met with scepticism (Babers and 
Pratt [1951]). 

The foregoing remarks do not cover all the possibilities, but they 
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serve to show that the correlation of tolerances for drugs applied sepa- 
rately need not depend on whether the site of action is the same. 


4, GENERAL EQUATIONS FOR NON-INTERACTIVE ACTION 


If @ is the action tolerance for one drug, an individual organism 
responds if w > w, and fails to respond if w < w. Using suffices 1 
and 2 to refer to the two drugs, the proportions of organisms gq, and q, 
failing to respond to separate applications of the drugs are respectively 
the probabilities that w, < o, and w, < @,. Thus 


= Pr < (5) 


and 
= Pr {w. < a}. (6) 


Obviously any equation for the joint action of two drugs must reduce 
to (5) when w, is zero, and to (6) when w, is zero. 

If the action of two drugs jointly applied is independent, an organism 
will fail to respond only when neither quantity acting exceeds the 
corresponding action tolerance (see Plackett and Hewlett [1952]), and 
hence g, the proportion not responding to a joint application, is given 
by 


q = Pr {w, < < we}. (7) 

Equivalently, the proportion responding is 
p = Pr {w, > , w2 < G2} + Pr fw, < , > 
+ Pr {w, > & , we > we}. 


Since g can here be more succinctly expressed than p, we use q in formu- 
lating models for non-interactive action in general. 

In deriving equations for similar action obtained previously, the 
same type of argument has been used, whether applied to doses (Bliss 
[1939], Finney [1952a]) or to amounts acting (Plackett and Hewlett 
[1952]). We now recapitulate the essence of the argument and then 
show why it is unsatisfactory. If at the common site of action, the 
first drug is k times as active as the second, w, of the first drug acting 
will have the same physiological effect as kw, of the second. Thus, 
if both act similarly, w, of the first acting together with w, of the second 
can be expected to have a physiological effect equal to (kw, + we) 
of the second alone. In view of (6) an equation for similar joint action 
is obtained as 


(8) 


q = Pr {kw, + w2 < wr}. (9) 


4 
+ 
E 
3 
3 
| 
| 
| 
> 
+. 
e 
* 
a 
d 
ey 
3 


598 BIOMETRICS, DECEMBER 1959 


The alternative form, 


q = Pr {w, + < a}, (10) 


that is obtained by considering the activity of the second drug in terms 
of the first, gives the same value for q as does (9). Appropriate assump- 
tions concerning the distributions of @, and w, yield an equation of 
simple form for the evaluation of q (Equation (29) of Plackett and 
Hewlett [1952]). 

Equations (9) and (10) are very restrictive, much more so than 
they need be in order to embody the idea of similar action. k is in 
effect a relative potency applied to amounts acting, and, if response 
metameters corresponding to g, and q, are linear in log w, and log w, 
respectively, the lines concerned must be parallel; k is a fixed quantity 
in that it is applicable to every individual organism, and thus is appli- 
cable to groups; in the derivation of (9) and (10), @, anda, are assumed 
to be completely and positively correlated, and, more specifically still, 
to be such that w/a, = k. However, the concept of similar action 
merely implies that the pliysiological effects, leading to the quantal — 
response concerned, of the two drugs are additive; and clearly this 
concept is applicable to the individual organism. 

What is needed, then, is some quantity, relevant to the quantal 
response concerned, such as measures the physiological effect, leading 
to the response, of a drug on the individual. Now the ratio w/a = 6 
fulfills the requirements, and plays an important role in the theory 
to be put forward. It rises from zero when w = 0 to unity when w = a, 
i.e. when the response of the individual occurs. Moreover, the ratio 
appears suitable for the measurement of the physiological effect in 
similar (or partially similar) action for the following reason. If w is 
arbitrarily divided into two parts so that 


+0”, (11) 
then 


56 = 6’ + 6” (12) 


where 8’ = w/o and 6” = w”’/%. Thus the measurement of physio- 
logical effect in this way means that the total effect of an amount acting 
can be regarded as the sum of the effects of its parts, and that the 
physiological action of one drug can be regarded as similar to that of 
itself. This way of measuring the effect is compatible with the concept 
of a quantal response as indicating that an underlying graded response 
has reached a given level in the individual (Hewlett and Plackett 
[1956]). 
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Iquations (7) and (8) can now be written 
= Pr {6, < (13) 
g2 = Pr {& < 1}. (14) 


The total physiological effect due to: two drugs that act similarly is 
measured by (5, + 6,), and thus a general equation for similar joint 
action, 


q= Pr {6+ & < 1}, (15) 


is arrived at. 


The relation that (15) bears to (9) is easily seen, for (15) can be 
written 


q = Pr {(@2/a,)o, + < (16) 


Evidently @,/a: in (16) replaces k in (9), and is a form of relative 
potency applicable to the individual concerned, but may vary from one 
individual to another. Thus (9) is a special case of (15) in which 
G./a, = k, a fixed quantity. (15) implies no restriction on the relative 
slopes of the lines relating the metameters of g, and gq, to log w, and 
log respectively. 

Using (13) and (14), equation (7) for independent action becomes 


q = Pr {6, < 1, & < 1}. (17) 


Any general equation for non-interactive joint action must give (15) 
and (17) as extreme special cases. The possibilities are unlimited but 
simple general equations include 


q = Pr {v5, + 8: <1, 5, +8, <1} (18) 
and 


q = Pr {8)" + &” < 1). (19) 


lor the purposes of non-interactive joint action, in (18) 0 < » < 1 
and the equation reduces to (15) or (17) as v = 1 or 0; in (19) 0+ <A<1], 
and the equation reduces to (15) or (17) as\ = lorA—0. The param- 
eters vy and \ measure the degree of similarity between the actions 
of the two drugs; vy = 0 or \ — 0 represents independence in action, 
v = 1 orA = 1 represents complete similarity, and intermediate values 
of the parameters partial similarity. Fig. 1 illustrates the way in 
which (18) and (19) define the conditions of non-response. 

In (18) and (19) — <»< landO+ <A < + give consistent 
results in the sense that the equations then reduce to (5) or (6) as 
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FIGURE 1 
ILLUSTRATION OF THE CONDITIONS OF NON-RESPONSE DEFINED BY (18) AND (19) 
For (18) and » = 3, the individual organism does not respond if the point (1 , 52) 
lies to the left and below the angled boundary indicated; for (19) and A = 3, if to 


the left of and below the quadrant. For similar action the boundary is the broken 
line 6; + 8 = 1, and for independent action the broken lines 6; = 1, 6: = 1. 


W2 OF w, is made zero. However the range of 0 to 1 is sufficient to 
cover the full range of non-interactive actions, and values below 0 or 
above 1 will be discussed when we consider interactive joint action. 


5. EVALUATION OF q 


Equations (18) and (19) are very general and by themselves do 
not permit evaluation of g. To do this additional assumptions have 
to be made, namely concerning the joint distribution of the tolerances 
for the two drugs. The log-tolerances for one drug are here assumed 
to be normally distributed [Equation (3)], and so we assume that the 
joint distribution of the log-tolerances is bivariate normal, as we did 
in discussing independent action (Plackett and wane {1948, 1952]). 

In view of (1), for each drug 


6, (2,/z,)". (i 1, 2) (20) 
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From (3) we may write 


= a; + logz; (21) 
and 
—u; = a; + B; logz; , (22) 
so that 
(23) 
where 
0, = Bi/n (24) 


Assuming that the joint distribution of log-tolerances is bivariate 
normal, with correlation coefficient p, 


a= ena - 


(25) 
-exp {—(ul — + u3)/2(1 — p’)} du; du2. 
For (18) R is a region defined by 
tus) (zatus)/Os 
h + vh < 1, (26) 
For (19) R is a region defined by 


Fig. 2 shows the boundaries of integration in the (u, , u.) plane for 
the particular values 7, = —2, 2, = —1, 0, = 4.5, 0, = 8,h = 10. 
It shows the boundaries for » = 0 and vy = 1, which are the same as 
those for \ — 0 and \ = 1 respectively, and the boundaries for vy = 4 
and \ = 3, which are different. In evaluating q, integration is carried 
out over the region to the left of and below the boundaries shown 
in Fig. 2. 

The special cases of (25) for which g can be evaluated with a desk 
calculating machine are as follows: vy = 0,\ with-1 < p<+1 
(see Plackett and Hewlett [1948], Hewlett and Plackett [1950]);0 < », 
\ < 1 with p = +1, —1. For remaining combinations of p with » or X, 
q can be evaluated by means of an electronic computer. Lipton [1959] 
has devised a programme for evaluation of g using (25) and (26) with 
the electronic computer at Rothamsted Experimental Station. The 
programme using (25) and (27) would be simpler. 
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FIGURE 2 


EXEMPLARY BOUNDARIES OF INTEGRATION SHOWN IN THE (t1 , U2) 
PLANE OF THE BIVARIATE NORMAL SURFACE 


The boundaries are indicated by continuous lines marked with the corresponding 
values of » or \. For further explanation see text. 


The models (26) and (27) predict the same value for g when »v = | 
as when A = 1 (as of course they do when vy = 0 and’ — 0). When 
v=,A =], 


q Pr + < (28) 


When p = + 1, the bivariate surface shrinks to a normal curve on the 
line u, = u,. Denote by x the N.E.D. corresponding to p = 1 — gq. 
Thus when vy = \ = 1 and p = + 1, z satisfies the equation 


(29) is an equation for similar joint action with complete positive 
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correlation of tolerances for which the mixture response is readily 
calculated [see discussion for an ad hoc derivation of (29)]. The equation 
for similar action that we gave previously [Plackett and Hewlett [1952], 
(31) and (33)] is the special case of (29) for which 0, = @,. Where 
the amounts of the drugs acting are directly proportional to doses 
(29) becomes 


Krom (30) x ean be found with parameters estimated from responses 
to the drugs separately applied, but without the necessity for the 
N.E.D.-log-dose lines for the separate drugs to be parallel. As we 
shall show in a later paper, (30) accounts satisfactorily for data given 
by Turner for the joint action of certain insecticides closely related 
chemically, but for which 8, 

The equation for similar action given by Finney [1952a] can be 
shown to be (30) with the restriction that 8, = B. . 


6. DOSE-RESPONSE CURVES FOR MIXTURES 


Dose-response curves for mixtures of drugs are obtained when the 
total dose is varied and the ratio of the doses of the respective drugs 
is kept constant. The responses to mixtures can then be shown graph- 
ically in relation to the responses to the separate constituents (see 
Plackett and Hewlett [1948, 1952]). 

Fig. 3 shows N.E.D.-log-dose curves for two hypothetical special 
cases in which 7, = 72 = 1. The curves for the mixtures are based 
on (25) and (26). For any abscissa, each continuous curve shows the 
value of x corresponding to the values of x, anu v2 , which are indicated 
wherever possible by the broken straight lines. However, where 
v = 0, p= + 1, the value of z is the same as the greater of x, and x, , so 
that the lines for the two latter are not then shown as broken. The 
three upper panels of Fig. 3 show the case in which xz, = x, + 1.5, 
6, = 6, = 5; the three lower show that in which x, = 2x, + 0.5, 6, = 7, 
6, = 3.5. Tor each case the figure shows mixture curves for v = 0, 3. 1; 
p= —1,0,+1. 

The results shown in Fig. 3 are largely self-explanatory, but some 
salient features may be pointed out. The lower the value of p the 
steeper is the mixture curve; this is reasonable because only for p = —1 
does g = 0 for finite values of z, and z.. The higher the value of », 
the higher is the mixture response. However, only for high values 
does the mixture response necessarily increase with decreasing p; for 
v = 0, the mixture mortality is always higher the less is p, but for 
given vy > 0, the mixture curves for different values of p intersect one 
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FIGURE 3 
EXEMPLARY DosE-REsroNsE CuRVES FOR NON-INTERACTIVE MIXTURES 


another. Where p = + land @, = 6, = 6, the mixture curve isa straight 
line of slope 6; where p = + 1, » = 1, and 6, ¥ 6, , over the range of x 
shown, the mixture curve approximates to a straight line with slope 
about 3(@, + 62). 


7. DISCUSSION 


Three authors, Bliss, Taschdjian, and Claringbold, have criticized 
our paper of 1952. Since the present paper continues our approach 
of 1952 we now comment on their criticisms. Bliss [1954] questioned 
our extension of similar action to experiments on insecticides where 
the probit-log-dose regression lines are not parallel on the grounds that 
insect physiology is not yet completely understood; and he objected to 
our concept of dissimilar action with negative correlation, because it 
is not consistent with experimental results that have fallen within his 
experience. He evidently adheres to the set of three models of joint 
action that he proposed in 1939, and describes them as having been 
modified in detail by Finney [1952a] and ourselves [Plackett and Hewlett 
1952], but we regard this as an understatement. He describes as 
“synergism”’ the ability of two drugs to give responses exceeding those 
predicted by his models for independent or similar actions, and in fact 
his main aim is apparently to set up criteria whereby the joint action 
of pairs of drugs can be diagnosed as “synergistic” or not by means 
of quantal response data; whereas our aim is wider, namely, to provide 
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a general theoretical basis for the interpretation of quantal response 
data for mixtures. If a pair of drugs is “synergistic” in his sense, its 
biological activity is not necessarily sufficient to be practically useful. 
Morever, his criteria are such that they may result in a pair of drugs 
being designated as “synergistic”? and another as not so when the 
underlying mechanism of joint action is really the same for both pairs. 
Thus, while not adequately serving practical ends, Bliss’s criteria may 
be misleading in attempting a deeper understanding of joint drug 
action. 

Taschdjian [1956] discussed the application of information theory 
to the interpretation of drug action, including joint action, but developed 
no equations that could be tested for agreement with response data. 
His criticism of the model of independent action, on the grounds that 
it does not allow for the possibility of interaction, seems to be based 
on a misunderstanding, for it is the model for dependent action that 
does this. 

Claringbold [1955] commented that our [1952] set of mathematical 
models are based on many assumptions and are difficult to fit to ex- 
perimental data. The model he proposed as an alternative contains 
parameters that are more numerous and of uncertain biological meaning. 
In fact the fitting of the simpler of our models, put forward now or 
in 1952, is not unduly difficult with a desk computer, and electronic 
computing will doubtless enable the more complex to be used. More- 
over, it may not be necessary to estimate simultaneously all of a set 
of parameters from a set of dose-response data, because other evidence 
may indicate reasonable estimates for some of them. Tor example, 
if a drug reaches its site of action by some direct route, 7 = 1 may be 
a reasonable supposition, or, alternatively, experiments on the adsorp- 
tion of the drug by tissues may indicate what value 7» takes. Under 
suitable circumstances quantal response data for the joint action of 
drugs very closely related chemically may indicate the value of 7 
{Hewlett, unpublished]. 

A further comment on restrictions on the relative slopes of the 
N.E.D.-log-dose lines for separately applied drugs may be advisable. 
We assert that there is no necessity for drugs acting similarly to give 
parallel lines (even if they often do), and that the model [equation 
(15)} for similar joint action is self-consistent, even though it does not 
imply the parallelism. Bliss [1939, 1954] and Horsfall [1945] appeared 
to think that similar physiological action necessitates parallel lines, 
and, though their opinions on the point were not clear, they perhaps 
intended te make the parallelism part of their definitions of similar 
action. If so, their concepts of similar action cannot be regarded as 
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incorrect, but merely, we think, unsuitable because too restrictive. 
Our conception of similar action concerns events occurring in an indi- 
vidual organism, but restrictions on the relative slopes of the lines 
imply that the characteristics of groups are involved. 

Equation (29), for similar joint action with complete positive corre- 
lation of tolerances, can be derived directly from (15). (15) can be 


written 
21 Zo Na 
q=Pr {(2) 4- (2) < i} (31) 
1 2 


Now when the tolerances are so correlated, an individual with a toler- 
ance z, equal to, say, the ED50 for the first drug alone, will have a 
tolerance z, equal to the ED50 for the second drug alone. Similar 
considerations hald for any other particular ED. Thus if Z, and Z, 
are the respective particular values of z, and z, for the separate drugs 
giving a particular N.E.D. of response, z, the relation between the 
doses of z, and z, of the drugs applied jointly that give the response 


xis 
(2) (2) (82) 
(3) gives 
(¢ = 1, 2) (33) 
(34) 


Applying (24) and (32)—(34) to (31) gives (29). 

In pharmacology the joint action of two drugs is commonly described 
as “addition” if (32) with 7, = 2 = 1 is satisfied. This relation is 
applied to both quantal and graded responses, and in examining data 
a diagram is normally used to determine whether it holds (Gaddum 
{1949, 1952], Loewe [1928, 1953], Loewe and Muishnek [1926], Rentz 
{1932]). Thus for quantal responses “addition” corresponds to the 
special case of similar action [i.e. of equation (15)] in which the toler- 
ances are completely and positively correlated and the amounts of the 
drugs acting are directly proportional to their respective doses. 

Although the physiological basis for the equations for the extreme 
forms of non-interactive action has been indicated, that for the general 
equations, (18) and (19), has not. A physiological interpretation for 
(19) is not yet clear, but in (18) » may signify the proportionate degree 
of overlapping of the respective sites of action for the two drugs, or 
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of the respective periods of actions, or of a combination of both. Alter- 
natively, the first drug may have a primary effect on one physiological 
system and a side effect on another that is acted upon by the second 
drug, while the second drug has a side effect on the system acted on 
primarily by the first; if the total effects on the systems acted on pri- 
marily by the first and second drugs are respectively (6; + v6.) and 
(v6, + 6,), equation (18) follows. 

General equations for non-interactive action more complicated than 
(18) or (19) can easily be proposed, though the simpler equations are 
obviously to be preferred unless shown to be inadequate. General 


TABLE 1 
Stratus or EquaTIONs or JOINT ACTION PREVIOUSLY PROPOSED, IN THE 
LIGHT OF THE PRESENT PAPER 


Similarity Correlation N.E.D.-log- ‘‘Penetration’’ 


Equation of action of dose slopes parameters 
vor tolerances Bi, Be M1, 72 
p 


Similar action | 
(Bliss [1939], | 
Finney [1952]) 

Simple similar action 
(Plackett and 1 + 1 Bing = Bom >0 
Hewlett [1952]) | 

Addition 

(Gaddum [1949]) 1 +1 * 1 

Independent action} 
(Bliss [1939]) 

Independent action | 
(Plackett and 
Hewlett [1948]) | 

(Dicke and 
Paul [1951]) | 

‘“‘Non-synergism”’ 
(Mather [1940]) 0 0 * * 

“Non-synergism”’ | 
(Gaddum [1949]) 0 +1 

Non-interactive 
action O<yA<1 -1l<p<t+1 >v 
(present paper) 


*Formally unrestricted, but can be taken as > 0. 
10 < p< +1 if Bliss’s generalization is accepted. For discussion of this see Plackett and Hewlett 
{1948}. 
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equations each with two parameters are 


q = Pr {v,6, + < 1, 6; + md, < 1}, O<», (35) 


/ 


(36) 


If v, * v, , equation (35) might represent the situation in which the 
sites of action so to say differ in size, and the overlap then forms a 
different proportion of each. Obviously an interpretation of (35) in 
terms of side effects, comparable to that given for (18), is also possible. 
Finally, it may be helpful to tabulate various equations, proposed 
previously for quantal responses to mixtures of drugs, showing which 
special cases of equations (25)—(27) they represent. Most of the equa- 
tions were employed as standards whereby “synergism”? was judged 
to occur if the response to a mixture exceeded that calculated from the 
equation concerned. Bliss [1939] proposed his equations for similar 
and independent actions largely for this purpose. Dicke and Paul 
[1951] took synergism to occur if p > p, + p» (assuming p, + p. < 1); 
Gaddum [1949] if p > p, or p. , whichever was greater; and Mather 
{1940] if < qiq.,ie. if p > pi + — pyp.. Replacing these in- 
equalities by equalities gives the respective equations for independent 
action with p = —1, + 1, and 0 (see Plackett and Hewlett [1948]). 


SUMMARY 


The joint action of drugs is classified into interactive and non- 
interactive types according as one drug does or does not modify the 
biological action of another. Simple similar action and independent 
action (Plackett and Hewlett (1948, 1952]) are regarded as the extreme 
forms of non-interactive action. Basic equations for these extremes, 
expressing the conditions of non-response in the individual organism, 
are given; these place no restriction on the correlation of tolerances, 
nor on the relative slopes of the N.E.D.-log-dose lines for the separate 
drugs. Introduction of a parameter measuring the degree of similarity 
between the modes of action of the two drugs enables basic general 
equations for non-interactive joint action to be derived. When a 
basic general equation for non-interactive action is combined with an 
assumption of a bivariate-normal distribution of log-tolerances, the 
response to a mixture of drugs can be calculated. In general the calcu- 
lation requires integration of the bivariate-normal function over a 
particular non-rectangular region, which is feasible with an electronic 
computer; however, certain useful special cases require only a desk 
calculating machine. Exemplary sets of theoretical dose-response 
curves for mixtures are given. 
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SMALLEST COMPOSITE DESIGNS FOR QUADRATIC 
RESPONSE SURFACES 


H. O. Hartley 
Towa State University, Ames, Iowa, U.S. A. 


1. Description of Designs 


Consider an experimental situation in which a response y depends 
on n different factors x; ,t = 1,2,---,m. Ina 2” factorial design each 
of these n factors is tried at 2 different levels which may be taken as 
x; = —landz; = +1. Ina complete factorial design all 2” combi- 
nations of the 2 levels +1 of all n factors are tried. This design does 
not permit the fitting of a quadratic response surface of the form: 


In fact the 2” factorial only permits estimation of the coefficients 
Bo , Bs , Biz , but does not give information on the 6,; . It was with 
this latter object in mind that composite designs were introduced by 
Box and Wilson [1951]. In their original form these designs add certain 
additional points to the 2” factorial. These augmented designs (some- 
times called “Star designs’) may be regarded as a one factor al a time 
experiment in which we start at the center of the factor space (2; = 0) 
and then try each factor in turn at both a high (a,) and a low (—a,) 
level, whilst all other factors are held at the central level 0. 

This means that we add the following 2n + 1 points to the 2” factorial 
design: 


for 
and 
z= 0 for i= 1,2,---,n. 


The total number of experimental points is therefore composed as 
follows: 
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2” Factorial 


Star | Total 


Number of points 2 2n+1) 2°+2n+4+1 


As n increases the first component (2") becomes a rapidly increasing, 
predominant portion of the total number of points. In order to reduce 
this, composite designs have been suggested in which the factorial 
is replaced by a fractional replicate of a 2” factorial, see e.g. Davies, 
O. L. [1956] pp. 440-83. 

It is well known that when such a reduction of experimental points 
is made, it may not be possible to estimate all the coefficients in the 
quadratic surface (1). Indeed, if we have a 1/2‘ replicate of the 2" 
design, the total number of experimental points is 2**" + 2n + 1 
whilst the number of coefficients in (1) is 2n + 1 + 3n(n — 1) and 
the latter may well exceed the former. The question then arises how 
many and what coefficients can be estimated from a composite design. 
The present paper gives a simple answer to this question. 

It should be noted that in this paper we only examine composite 
designs with regard to their merits as a single (one-piece) experiment 
which is supposed to supply directly estimates of the parameters in 
the response surface (1). The question of suitable designs for sequential 
experimentation (on the lines recently discussed by C. Daniels [1958] 
in conjunction with first order designs) is not raised in this paper. 


2. The Fitting of a Quadratic Response Surface to a Composite Design 


We require the following concepts and notation concerning fractional 
replicates of 2" factorials: 

Consider a 1/2" replicate of a 2" factorial. Writing m = n — k 
we have 2” points in such a design. Of the 2” — 1 contrasts in a complete 
2” factorial experiment 2" — 1 are sacrificed and form the defining equa- 
tions of the design. The remaining 2" — 2" contrasts fall into 2”* — 
1 = 2” — 1 alias-sets, each set containing 2‘ contrasts which are aliased 
with one another. 

We restrict ourselves to fractional replicates in which no main 
effect is sacrificed, i.e. equated to the identity J so that half of the 
2"* points have xz; = 1 and half have z; = —1. We also assume 
without loss of generality that whenever main effects are equated in 
a defining equation of the form x; = +2, , that the + sign has been 
used throughout, i.e. that the defining equation is of the form z; = 2; . 

Although it is realized that this assumption is unconventional 
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(since it will usually not produce the principal block) it is nevertheless 
found to simplify the subsequent exposition. 

The designs here derived are based on the following fairly obvious 
statement: 


Theorem 1. 


a. In any composite design, in which no main effect is used as a 

defining contrast of the (1/2‘)2” fractional replicate, it is always 

possible to estimate the following coefficients of the quadratic 

response surface (1): 

all linear coefficients (b,;), all quadratic coefficients (b;,;), and 
the constant bp ; 

one of the product coefficients (b;;) sclected from cach of the 
alias-sels*. 

b. It is not possible to estimate more than one of the product 

coefficients (b;;) from each alias-set. 

The implication of Theorem 1 is that the largest number of coeffi- 
cients of (1) can be estimated if each alias-set of the fractional replicate 
contains at least one two-factor interaction z,2, permitting the selection 
of one coefficient b;; from each of the alias-sets. In a composite design 
of this kind we can estimate a total of 2n + 1 + 2” — 1 coefficient, 
i.e. one by , n values of b; , n values of b,;; , and 2” — 1 values of b,; . 
Since the total number of points is 2n + 1 + 2”, this design would 
leave one degree of freedom for error if all possible b,,; are fitted. 

It is of some interest to spell out a few of these designs in some 
detail as they sometimes differ from those usually recommended for 
use as fractional replicates per se. 


a. The 3 replicate of a 2* 


The usual design recomended is the one whose defining equation 
iS %4%_t3t, = +1 (see e.g. Davies, O. L. [1956] p. 484). With this 
design all main effects are clear of two-factor interactions. Jlowever. 
the six two-factor interactions pair up in three of the seven alias-sets 
so that only three of the six b;; can be estimated if this fractional re- 
plicate is used in a composite design. 

On the other hand if we use the defining equation x, = 2.2, , we 
obtain the following seven alias-sets of effects: 


*I owe to O. Kempthorne the observation that assertion a of the Theorem can be verified rather 
directly as a consequence of the theory of linear estimation. 
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The effects shown inside parentheses are three-factor or higher and 


do not represent any coefficients in the second-order law (1). It will 
be seen that all two-factor interactions occur in different alias-sets so 
that all six b;; can be estimated (along with all b; , all b;; , and b)) when 
this fractional replicate, set out below, is used in a composite design. 


1 1 —1 
2 -1 1 -1 -1 
3 —1 1 
4 1 1 1 —1 
5 1 -1 -1 1 
6 -1 1 -1 1 
7 -1 -1 1 1 
8 1 1 1 1 


b. The } replicate of the 2° 


Again the usual design recommended (O. L. Davies, p. 486) is one 
in which all main effects are clear from two-factor interactions whilst 
the latter double up in other alias-sets. Therefore this design, when 
used in a composite design, does not permit the estimation of all b;; 
in (1). However, the following fractional replicate does permit the 
estimation of all fifteen },; : 

As defining equations use: 


For the fifteen alias-sets we only list the main effects and the two- 
factor interactions in them. 


= = W113 , Xz = Xe = = = 


whilst the nine remaining z,x; are only aliased with three-factor or 
higher-order effects. 
This design is set out below (facing page): 
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Point vy 3 Ts 
1 1 —1 —1 1 
2 —1 1 —1 1 -1 —1 
3 -1 1 1 —1 
4 1 1 1 1 —1 -1 
5 1 —1 —1 —1 1 —1 
6 —1 1 —1 —1 1 —1 
8 1 1 —1 
11 —1 1 —1 —1 1 
13 1 —1 1 
14 1 —1 1 dl 
15 —1 1 1 
16 1 1 1 | 1 1 


c. The } replicate of a 2° 


Here the usually recommended factorial replicate (O. L. Davies, 
p. 485) has all main effects and two-factor interactions in different 
alias-sets so that all ten b,; can be estimated (along with the b; , b;; , 
and b,). The defining contrast is 7,7.73;2%,2, = +1. 


d. The 3 replicate of 2° 


Here we cannot expect to estimate all b,; since their number is 


(5 X 4) = 10 whilst the number of alias-sets is only 2* — 1 = 7. 
Thus the maximum number of b,; which can be estimated in accordance 
with Theorem 1 is seven. The usually recommended fractional replicate 
(O. L. Davies, p. 484) does in fact permit this estimation. 

The defining equations are: 


The seven alias-sets for this design are as follows: 
= T= = Tks; Ly = 


Thus, in our estimation b,,. is aliased with b3, , b,3 is aliased with 
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ba, , and b,, with b,; ; all other b;; can be estimated along with all b, , b,; , 
and b,. The design is as follows: 


Point 


Te Z3 Ts 
1 1 1 -1 -1 
2 1 1 -1 
3 -1 1 -1 1 -1 
4 -1 -1 1 1 —1 
6 -1 1 1 
7 1 1 
8 1 1 | 1 1 


To prove Theorem 1 suppose that a selection of coefficients satisfying 
a and b has been made and that p < 2” — 1 product terms have been 


selected. 


It will be helpful to set out the design matrix X (i.e. the values 
of x, , x; , and z,z,; which multiply into these coefficients b; , b;, , 03; 


at the 2” + 2n + 1 experimental points) in Table 1 below: 


TABLE 1 


Destcn Matrix X Givine THE VALUES OF THE 2; , 42; 
INTO THE CoEFFICIENTS b; , b;; , bs; AT THE HEAD oF CoLUMNs 


COEFFICIENTS IN QUADRATIC SURFACE 


No. 
Design of n Linear n Quadratic p Selected 
Points Products 
bo bi be ba Dir bee Dan bi; Die 
1/2* 1 +1 +1 +1 4 +1- +1 
Fractional 1 +1 +1 +1 | +1 +1 
of the 1 +1 +1 +1 1 +1- +1 
2" Factorial 1 +1 +1 +1 1 +1--- +1 
0 0 00: 0 0 
1 -a 0|aO0--- 0 
1 a@ 0/0 0 | O 
1 0 —an| 0 0 a | O 0 
1 0 an| 0 O--- 0 
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This design matrix X has 2n + 1 + p columns and 2” + 2n + 1 rows. 
We now form the 2n + 1 + p square matrix XX’ and show that its 
determinant is positive. 

In order to form XX’ we arrange the coefficient by , 6; , b;; , b;; into 
groups as follows: 

A. The even order group: bo , = 1,2, 

B. One alias group for any of those of the 2” — 1 alias-sets which 
contain either at least one main effect or have a two-factor interaction 
x,x; represented in the p selected product terms. 

It is easy to verify from Table 1 that the sum of the products of 
elements for any two columns in different groups is zero, because for 
two effects in different alias-sets the +1 patterns in the fractional 
replicate are orthogonal. This means that the determinant | XX’ | 
is a product of the determinants evaluated for each group of coefficients 
separately. 

We first consider an alias group under B. Denote by a the number 
of coefficients in the alias group. We distinguish two cases. Case I: 
A product term 2,2; has been selected from the alias-set; we call it the 
ath coefficient so that the first a — 1 coefficients are aliased main 
effects. Case II: No product term has been selected from the alias- 
set and all a coefficients are main effects. The a X a matrix XX’ for 
such an alias group is of the form shown in Table 2 if we renumber the 


main effects in the alias group 1, 2, --- , aor (a — 1). 
TABLE 2 
Matrix XX’ ror Atias GROUPS 
a columns 
2" + 2ai 2m 2m 
& rows 2m 2m +203 2m 
Qm Qm 4 


In Case I we have a, = 0. If the sign of x,2, differs from that of , 
the main effects in the alias-set, the last row and column may have to 
have their signs changed leaving the last diagonal coefficient +2”. For 
the purpose of evaluating the determinant of XX’, such a sign change 
is irrelevant; it corresponds to replacing b;; by —b,; . 
It is possible, by standard methods (see e.g. Aitken, A. C. [1954] 
p. 87) to derive the formula for the determinant D,, (say) of the matrix 
XX’ in Table 2 and show that it is the symmetric function of the a, 
given by (3) below: 


' 
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Duy = Il 2ai(1 +2""' (3) 


t=1 


In Case I, when a, = 0, this reduces to 


a-l 
D, = 2” |] 2a’ . (4) 

Both D, and D,, are clearly positive so long as all a; involved in 
D, are positive. Formula (4) for D,; shows that D, = 0 if one of the 
remaining a — 1 values of a; = 0, and this implies that we cannot 
estimate more than one product term from any alias group. We now 
turn to the even-order group by , b;; . The matrix XX’ for this group 

is shown as Table 3. 


TABLE 3 
Matrix XX’ ror EveEN-OrpER Group 
n + 1 Columns 
n+1 | +4 2a) 2m 2m 
Rows 2™ + 2a} 2” +. Qas Qm 
2” 4 2a? Qn Qn Qm + 


The determinant A for this matrix can be shown to be given by 
the symmetric function. 


a= J] az? :) | + i} (5) 


t=1 t=1 i=l 
which is clearly positive for a; > 0. The numerical inversion of the 
matrix XX’ is described by O. L. Davies (p. 557) in the special case 
when all a, are equal to a, but no use is made there of (5) which, in 
this case, reduces to the particularly simple form 


A’ = o( | + i} (6) 


3. The Variances and Covariances of the Estimated Coefficients 


The algebraic forms of the relevant determinants (3), (4), and (5) 
permit the derivation of similar forms for the minors and hence an 
algebraic form of the inverse matrix. Whilst in most practical situations 
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the numerical inversion of the matrix XX’ and solution of the equations 
are comparatively simple, some of the inverse elements will be listed 
here as they throw an interesting light on the variances and covariances 
of the coefficients estimated from a composite design. 

Any two coefficients in two different alias groups are of course 
uncorrelated so that we may confine ourselves to a single alias group 
and here we again distinguish the two cases: 

I. A product term 2x,x; has been selected from the alias group. 

II. No product term has been selected from the alias group. 

Application of formulas (4) and (5) leads to the following formulas 
for the variances and covariances in which o° is the residual variance 
of the response surface. 

Case I (b,, denotes the product term of the alias group): 


Var b; = o°/2a; , (7) 

Var = (2 +3 > (8) 

Cov b,b; = 0, (9) 

*Cov = zo’ . (10) 


Case ll: 


(1 


2ai(1 


Var b; = 


(11) 


t=1 


Cov b.b; = 1 (12) 


Finally we obtain for the even-order group (by , b;;) the following 
variances and covariances: 


o°/Var bb = 1+ 1) /(1 , 


i=l 


$i" + 2( ai" | + i} = 


Var bj; = 


*The sign of this covariance will depend on the signs selected in the defining contrasts. 
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Cov b;,b,, 


o{2"| 1 —2a*; — 2a; — (ai (a, ‘| taal} 


n 


+ — 1) | +1 


t=1 


Cov bob; ; 


These formulas simplify considerably in the special case in which 
all a; are equal. 

We may use these formulas to work out certain relative efficiencies. 
We may compare, for example, the variance of a linear coefficient 6; 
under two conditions, i.e. first when none of the fitted b;; corresponds 
to a two-factor interaction x,x; aliased with x; when the variance of 
b; (Case II) is given by (11). This is to be compared with Var b; when 
such a fitted two-factor interaction is aliased with x; and the variance 
of b; (Case I) is given by (7). Dividing the former (smaller) variance 
by the latter we obtain the resulting per cent efficiency e as 


e= 100 


(17) 


In the special case a = 1 (only one main effect in the alias group) 
we have 


e = 100/(1 + 2”"'/e’). (18) 


The reduction in per cent efficiency ¢ may be regarded as the price 
paid in loss of precision of b; through introducing the product term 
b,;2,2; . It will be seen that as a; — © there is 100% efficiency, i.e. 
no loss in precision. If, however, 2"-' > a’ the per cent efficiency 
may tend to zero. For the fairly characteristic values of a; = 2 and 
m = 3, we have 


e = 100/(1 + 4/4) = 50%, (19) 
i.e. we double the variance of b; when introducing the product term of 
its alias group. For the somewhat large value of a; = 3, we have 


e = 100/(1 + 4/9) = 69%. (20) 
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A second efficiency comparison of interest arises when we compare the 
respective variances of b; when x; has no alias main effeet with that 
when 2; does have an alias main effeet x, (say). 

‘Two situations arises: 

a. The alias group x, does already contain a two-factor interaction 
2,2, (say) which is being fitted in the law (1). 

b. The alias group of x; does not contain such a two-factor inter- 
action. 

In case (a) we have from (7) Var b;=0°/2a? no matter how many 
main effects are aliased with x; . Therefore there is no (additional) 
per cent reduction in efficiency here beyond that already evaluated 
in (17). In case (b), however, we find from (11) (evaluated for a=1 
and a = 2) for the per cent efficiency e that 


e = 100[1 — 2°"-?/(2""" + a5)(2""* + a3)]. (21) 


Again as a; and/or a, — © the per cent efficiency tends to 100% but 
if both 2"°-'/a? — © and 2""'/ai — © the efficiency tends to zero. 


4. The Variance of the Estimated Response Surface 


In this section we combine the variances and covariances of the 
individual coefficients to obtain the variances of the estimate of the 
response surface (1). To simplify the algebra we confine ourselves to 
the special case of equal a, = a. Formulas (7) to (15) of Section 3 
then simplify as follows: 

Case I (b,, denotes the product term of the alias group): 


Var b; = jae’, (22) 
Var b,. = o°[2™" + 3(a — (23) 
Cov b,b; = 0, (24) 
Cov b;b,, = (25) 
Case II: 
Var b; = — *o°2""*/(1 + 2” ‘aa (26) 
Cov b,b; = + 2” (27) 


‘The even-order group: 
o/Var by = 1 + 2"(na? — 1)?/(1 + (28) 
Var bj; = {2"""[3(n — + — 1) — +3}, (29) 


where 
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w = + 2na’ — + 1}. (30) 
Cov = “w"{2"[2n + 1 — 4a’] — 4a‘}, (31) 
Cov bob;; = + a7’). (32) 
The estimated response surface Y = Y(z, , --+ , 2) is now given 
by 
= by + b,2; b, > (33) 
t=1 


where the last sum )-, is extended over the selected product terms 
ba; 

Since the coefficients in different groups are uncorrelated, the 
variance of Y is composed of contributions from each of the groups of 
coefficients. We evaluate a typical representation from each of these 


groups: 
Alias group (Case I): 


Denoting the selected product term by z,2; and the contribution to 
the variance of Y by V,(Y) we find 


| 


= (x; + 2,2;) + 0° (34) 


where the summation > ’ extends over the a — 1 main effects x; in the 
alias group considered and the + sign depends on the signs selected 
in the defining contrasts. 


Alias group (Case II): 
Similarly we obtain for the variance contribution in Case II: 
¥) = — 2" ta “(1 + (88) 


where the summation },’ extends over the a main effect x, in the alias 


group. 
Finally we obtain as a variance contribution from the n + 1 coeffi- 
cients, by , bi, , «++ , Dan , the component 


V(Y) =A+B Hate (Hat) (36) 


t=1 


where 
D = Cov bj;bi, , [given by equation (31)] 
C = Var b;; — Cov b;;bi: , [given by equations (29) and (31)] 
B = 2 Cov bybj; , [given by equation (32)]} 


A = Var bh. [given by equation (28)] 
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The complete variance of Y is given by summation of the above 
contributions over all the groups of coefficients. 

It is instructive to employ these formulas for evaluating the effect 
on Var Y of introducing a product term b,,;2,x2; into the fitted law. 
Consider an alias-set in which this term has been selected as the product 
term among a total of a terms. The resulting alias group of coefficients 
is then one belonging to Case I and its variance contribution is given by 
(34). If this term is omitted from the fit we convert this alias group 
to one belonging to Case II resulting in a variance contribution given 
by (35) with a replaced by a — 1. 

Thus the increase in variance of Y which is due to unnecessarily 
including the product term b,;2,2; in the fit is given by 


(37) 
+ 27 + (@ — 2" 


If, however, the decision is made (wrongly) nof to include the term 
b,;%,2; in the fit, there will result a bias in the fitted law given by 


oma Dar 

where the dz; is taken over the (a — 1) main effects which are in the 
same alias-set as 2,2; . 

The question whether the risk of a bias given by (38) may be war- 
ranted in view of the variance reduction (37) will in general depend 
on the magnitude of the ratio 67,/c°. Similar assessments can be made 
for the omissions of other terms in the fitted law. 


Summary 


Composite designs consist of a 1/2" fractional replicate of a 2" 
factorial with factor levels at 7, = +1(¢ = 1, 2, --- , n) to which is 
added a “star design” = +a,,2; = Oforj ¥iandi = 1,2,---,n 
and x; = 0 fori = 1, 2, --- n. This design is recommended in the 
literature for the fitting of a general second-order response surface. 
It is shown in this paper that the designs usually recommended as 
fractional replicates per se are not necessarily the best when used in 
composite designs. Alternatives permitting the estimation of the 
largest possible number of coefficients in the second-order surface are 
developed and their efficiencies discussed. 
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141 NOTE: A Rapid Significance 
Test For Contingency Tables 


Morton KuprERMAN 


The George Washington University 
Washington, D. C., U.S. A. 


The recent publication of a table of 2m In n for n = 1(1)2,009, te 
4D (Woolf [1957]) and a table of n In n for n = 1(1)1,000, 10D (Isullback 4 
[1959]) now makes it possible to calculate simply and rapidly a statistic 
for an r X e¢ contingency table for testing a null hypothesis of inde- 
pendence of the row and column classifications. The procedure to 
be described is discussed in detail by Woolf and Kullback. The for- 
mer’s approach is via the likelihood-ratio statistic and the latter’s is 
via information theory, using the information statistic 27. The re- 
lation between the two statistics is: 27 = —2 In (likelihood ratio). 
The information statistic gives a reasonably close approximation to 
the x’ statistic in large samples (Wilks [1935]). 

Consider a contingency table of r rows and ¢ columns, into which 
the N independent observations of a random sample have been catego- 
rized by both a row and a column classification. Let f;; represent 
the frequency for the cell in the 7th row and the jth column of the 
table. Let f;, represent the sum of the frequencies in the 7th row and 
let f.; represent the sum of the frequencies in the jth column; f;, and 
f.; are thus the marginal totals. Obviously 


The contingency table may be represented by the following array: 
(See top of page 626): 


The information statistic is 


i=l j=1 


For large samples, 27 is asymptotically distributed in a x*-distribution 
625 
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Row Column classification 
classification Total 
1 fir hij Sie fi 
2 fra S23 See So 
fa Siz Sis Tie Si 
r fn Sra Sei Ive Se 
Total Ba Be N 


with (r — 1)(c — 1) degrees of freedom under the null hypothesis of 
independence (Wilks [1935]; cf. Kullback [1959, pp. 157-58]). 

The information statistic 27 is very simply calculated from (r + 1) 
(c + 1) entries in either Woolf’s table of 2n In n or Kullback’s table 
of n In n, rounded to 2 or 3 decimal places. For small values of r and 
c, such as in the fourfold and 3 X 3 tables, it is easy to write the figures 
on a sheet of paper and add them up by hand (positive and negative 
values separately). 

The information statistic 22 is treated just like the x’ statistic 
calculated laboriously for the contingency table and is, indeed, very 
close to the value of x’. Rapid methods of calculating contingency- 
table x” have previously been described by Leslie [1951] and Skory 
[1952] in this journal; it ‘will be found, however, that the use of 2f 
in place of x’ will result in a considerable saving in time, particularly 
if the number of rows or columns is large or if there are many contingency 
tables to analyze. 

A numerical example, using Leslie’s data, follows. It is suggested 
that the reader perform both the x’ and the information calculations 


Wi 24 7 38 
Ws: 76 38 70 184 
WwW; 69 32 82 183 
WwW, 27 9 55 91 


i Total 
de 
sh 
| Total | 196 86 214 | 496 m 
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in order to experience for himself the advantage of escaping the squar- 
ings, multiplications, and divisions that plague the calculator of x’. 
By using the table of n In n, we find that the information statistic 
is 
2f = 2(1961.848 + 3078.462 — 2461.600 — 2565.903) = 25.61. 


Normally the separate totals shown above need not be recorded, but 
they are presented here solely for purposes of illustration. The value 
of x’ as calculated by Leslie and Skory is 24.93; thus the difference, 
in this example, is only 0.68. Our value of 22 is as close to the value 
of x” as we need in practical work, but the labor saved is considerable, 
to say nothing of the reduced chance of error in our numerical calcu- 
lations. 

Note that only (r + 1)(c + 1) quantities (5 X 4 = 20 in this example) 
need to be looked up in the table and either added or subtracted on 
an adding machine. Note further that, as a result of adding r X c + 1 
quantities and subtracting r + c quantities, one of the numbers left 
in one of the dials of a machine such as the Friden, Marchant, or Monroe 
calculating machine is the number of degrees of freedom, namely 
(r — 1)(c — 1); for this example the number of degrees of freedom is 
3 X 2 = 6. This latter observation provides a good partial check 
on the completeness of one’s calculations. 

The use of the information statistic 27 (or minus twice the natiral 
logarithm of the likelihood-ratio statistic) for analyzing contingency- 
table data has been noted by several authors in the past, notably Wilks 
[1935], Rao [1952, p. 200], and Woolf [1957]. Mood [1950, p. 276] 
also discusses the likelihood-ratio statistic for contingency tables, but 
does not give the explicit formula shown above, although it is implied 
in his discussion of formula (8) on p. 276. For a thorough discussion 
of the problem of the analysis of contingency-table data from the 
standpoint of statistical information theory, see Chapter 8, ‘‘Contin- 
gency Tables,” of the book by Kullback. 

It should be noted that the procedure for testing a null hypothesis 
of independence between row and column classifications in an r X c 
contingency table is similar to the procedure for testing the homogeneity 
of r independent samples from a c-valued population; the row totals 
(sample sizes) are fixed in advance. The statistic and the number of 
degrees of freedom turn out to be the same for both problems, but it 
should be emphasized that the two problems are distinct. (See Kupper- 
man [1959] for an application of the information statistic and the table 
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of n In n to the problem of testing the homogeneity of two independent 
samples.) 

It would be most useful to have Woolf’s or Kullback’s table ex- 
tended to at least n = 10,000, for in practice frequencies greater than 
1,000 often arise. 
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142 QUERY: Ona Quail Roadside Count Technique 


Our department has recently completed the second year of a study 
evaluating the reliability of our Gambel’s quail roadside count tech- 
nique. In order to learn something of the variation in the counts, 
five consecutive morning and five consecutive evening counts were 
made on eight of the already established survey routes. The routes 
were considered in the analysis as a random selection of all possible 
routes throughout the state. Route mean counts proved to be pro- 
portional to their standard deviations so a log transformation was 
applied in a manner similar to the procedure outlined by Hartley, 
Homeyer, and Kozicky, (J. Wildlife Mgt. 19 (4): 495 [1955)). 

Based upon the analysis of variance table enclosed I would like 
to be able to make some recommendation on number of road routes 
needed to be able to detect certain specified minimum levels of change 
in the count between years when it occurs. First I used the formula 
(G. W. Snedecor [1956], Statistical Methods, 5th ed. 1.8.C. Press, pp. 
275, 309, and 320) using the untransformed count data but I am not 
sure that this procedure is correct. Is it possible to use the log data 
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for these calculations? If routes are random effects and we are testing 
for difference between 2 years, what do we use for s) in your sample 
size formula—o*, (see Table). Also, how do you calculate delta? 


TABLE 1 
ANALYSIS OF VARIANCE OF QuaIL CounTS TRANSFORMED BY THE LOG TRANSFORM* 
Mean 

Source of Degrees of | Square Expected 

Variation Freedom | (X 10°) Mean Squares 
Replicates (r) 4 456 | o2 + abeo; 
Years (a) 1 1744130 | + rhage + 
AM vs. PM (b) 1 91105 | o? + + rack; 
Routes (c) 16134 o? + rabo: 
Years X AM PM 1 14203 + Taare + reKas 
Routes X years 7 20596 o? + rbo;. 
Routes X AM vs. PM 7 1322 | + rag}. 
Routes X aM vs. X years 1831 | o2 + 
Error 124 878 | o? 

Total 159 


*In the table K? represents a mean square component for a tixed effect as defined by Snedecor 
(see reference above), Sec. 10. 11, p. 258. 


Log Mean (1958) = 2.494, Quail Count Mean (1958) = 255, 
Log Mean (1957) = 2.284, Quail Count Mean (1957) 9s. 


ANSWER: 


In studying your questions 1 have referred to the paper by Hartley 
et al. [1955] to which you referred in your letter, a preceding paper 
upon which this paper was based, Kozicky, Hendrickson, Homeyer, 
and Speaker, “The Adequacy of the Fall Roadside Pheasant Census 
in Iowa,” Trans. of 17th North American Wildlife Conference, March 
[1952], published by Wildlife Mgt. Inst., Washington 5, D. C.; a later 
paper by Kozicky, Jessen, Hendrickson, and Speaker, ‘Estimation of 
Fall Quail Populations in Iowa,” J. Wildlife Mgt. 20 (2), 97 [1956], 
and the concepts of matched sampling units as described by Dr. R. J. 
Jessen in “Statistical Investigation of a Sample for Obtaining Farm 
Kaets,” Res. Bull 304, Iowa Agricultural Experiment Station [1942]. 

In considering your questions some assumptions have to be made, 
of course, which you have recognized in part in your letter. First, 
the routes must be a random sample from some population of possible 
routes. Second, I am not clear about the first source of variation in 
your analysis of variance, “Replicates.”” Are these replicates merely 
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duplicates of determinations for cach route, 5 times in the morning 
and five times in the afternoon for each route, or is there a definite 
time sequence pattern for traversing all routes over say a week or 
two-week period such that this source of variation should be removed? 
If so, I wonder about the sum of squares isolated for the source being 
so small since the M.S. for Replicates is less than your error M.S. 
If the replicates are merely duplicate determinations, the sum of squares 
for “‘Replicates” can be included with “Sampling Error,’”’ which then 
has 128 degrees of freedom instead of 124. 

Next, I wish to remark that your difference between aM and PM 
is so large that you would wish, perhaps, to use only one of the two 
times of day for estimating the quail density. Hence, it would be of 
interest to compute an analysis of variance for AM and PM separately 
and use the one of most interest. This would produce some further 
information on homogeneity of variation, too. 

Assuming such a revised analysis of variance to be prepared, the 
source of variation to concentrate on is the “Routes X Years” when 


100 per cent matching of routes is being considered. This Routes X 


Years Mean Square then would still contain the components of varia- 
tion indicated in your analysis of variance table. The square root of 
this Routes X Years mean square will then be your s rather than 
the variance component suggested in your letter. 

If you prepare separate analyses of variance for AM and pM, the 
mean square for “Routes X Years’’ will have an expectation of o + 
To,¢ With r = 5 since b will drop out in dividing the data: The previous 
remarks about s are correct if you plan to continue the same route 
coverage procedure, i.e., five replicates or duplicate counts. Should 
you, however, wish to change this procedure to only one or two repli- 
cates for each route since this source of variation is small, a new value 
of & would need to be computed for use in the formula. This new value 
would be based on the value of r and the estimated variance components 
for o and o?,. Using your data as an example and r = 2 would give 
.00878 + 2(.01972) for s although you will actually get a different 
value for r = 2 from the separate analyses for AM and PM. 

Instead of beginning with Snedecor’s formula for sample size on 
p. 275, we might begin with the simpler formula on p. 60 (chapter 2) 
and then work up to the formula, p. 275. Thus, one can see what 
price he needs to pay for some certainty of detecting a specified differ- 
ence. If using the formula on p. 60, for sp you would use 7/2 as a 
multiplier for the s) just indicated in the preceding paragraph (note 
that your Routes X Years Mean Square is similar to ‘the Error Mean 
Square discussed by Snedecor in problems 11.2.2-11.2.4, p. 294). 
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Now, let us consider the problem of 6. Referring to Snedecor, 
p. 321, and an ordinary table of logarithms, we see that a difference 
in logarithms of about 0.08 would correspond to about a 20% change 
in means. Hence, you might use the analysis of variance of the loga- 
rithms with 6 as 0.08 or any other suitable value for a per cent change 
you might wish to detect. 

It should be noted that your transformation is different from the 
one described by Snedecor. Referring to Hartley et al. [1955] the trans- 
form you have used is y = log (n + c) — log m where n is your quail 
count on a route and c = m/(a/b). In the expression for c, m is the 
length of route in miles and “a” and “b” are the estimates of slope 
and intercept in the linear relation between the standard deviation 
of number of quails observed per mile of route length and the mean 
number per mile. From the transform we see that a standard deviation 
in logarithms is related to a coefficient of variation in n + c and not 
n alone. Your problem, however, is one of testing a hypothesis rather 
than estimation. Therefore, it seems appropriate to determine the 
sample size from the log analysis. The purpose of the transform has 
been to stabilize variance which is desired for making a test of change 
in quail numbers. 

H. JEBE 
Towa State University 
Ames, Iowa, U.S. A. 


CORRECTIONS 


In the paper by F. B. Leech and M. J. R. Healy [1959], Biometrics 
15: 98-106, Equation (12) on p. 102 should read: 


ly = kets) / Pro 


In the paper by D. E. W. Schumann and R. A. Bradley [1959], 
Biometrics 15: 405-416, the following reference on p. 416 was omitted: 


Cochran, W. G. [1943]. The comparison of different scales of measurement for 
experimental results. Ann. Math. Stat. 14, 205-216. 


In Abstract 574 by R. E. Bargmann [1959], Biometrics 15: 330, 
replace “‘Hadermar” with “Hadamard’’. 
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ABSTRACTS 


The following are additional abstracts of papers that were presented at 

joint meetings of the Eastern North American Region of the Biometric 

Society, The Institute of Mathematical Statistics, and the Physical Science 

Section of the American Statistical Association held at the Graduate School 

of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania on 
March 19 to 21, 1959. 


as Rejection of Outliers. 

If one reading is a long way from the rest in a series of replicate 
determinations, there is temptation to reject it as spurious. Numerous 
criteria for the rejection of outliers have been proposed and discussed 
during the past hundred years. They seem always to have been regarded 
as something like significance tests, and attention has been focused on 
rejection rates. It is suggested that rejection rules are not significance 
tests but insurance policies, and attention would be better focused on 
error variance. A detailed study is made of the effect of routine appli- 
cation of rejection criteria to (chemical or other) determinations mide 
in triplicate. 


ALLAN BIRNBAUM (Columbia University, New York, New 
613 York*). Maximum Likelihood Methods: Generalizations and 
Non-asymptotic Justifications. 


A treatment of estimation problems which leads to extensions and 
unification of current theories is based on the following formulation for 
the case of a real-valued parameter 6: Let 6* denote any given estimator; 
let a(u, @) be defined for each u > 6 as Prob (6* > ul@) and for each 
u < 6as Prob (6* < u|@), so that probabilities of all types of errors 
of estimation are represented by values a(u, 6). Admissible estimators 
(those for which no error-probability a(u, @) can be decreased except 
by increase of another) and complete classes of estimators are e defined 


*Now at the Institute of Mathematical Sciences, New York University. 
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F. J. ANSCOMBE (Princeton University, Princeton, New Jersey). 
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in the usual decision-theoretic way, and characterized by use of the 
usual (Bayes solution) techniques. Generalized maximum likelihood 
estimators are defined as follows: Let 6’(@) and 6’’(6) satisfy, for each 
6, 0'(0) < 6 < 6”(6) with at least one inequality strict. Let ¢(z, 0) = 
[log fx, — log fx, — 6’(@)] where f(x, 6) is the density 
function of the sample point x. Let G(0, a) be defined by a = Prob 
[t(X, 6) < G(@, a)|6]. Let a(6) satisfy 0 < < 1. Let 0) = 
t(x, 0) — G[@, a(6)]. Assume that, for each x, v(x, 6) is decreasing in 
6 and assumes the value 0 for some 6 = 6* = 6*(x). Then the (genera- 
lized maximum likelihood) estimator 6* is admissible; the proof utilizes 
the fact that (x, 6) is a monotone function of a likelihood ratio for each 
6. Taking G = 0 and (6 — 6’) — 0 for each 8@, v(x, 6) = 0 defines the 
maximum likelihood estimator, which is thus (under a mild regularity 
assumption) shown to be admissible. Taking a(@) = .5 makes 6* a 
median-unbiased estimator. Taking a(@) = .95, for example, makes 6* 
an upper .95 confidence limit. More generally, for an observed z, let 
a (0, x) = Prob [t(X, 6) < ¢(x, 6)|6] for each 6, and let c(6, x) = min 
[a(6, x), 1 — a(@, x)]. Then c(6, x) or its graph is a “confidence curve 
estimate” which embraces confidence limits at all levels and a median- 
unbiased point-estimate. Approximate computations of c(@, x) are 
obtained conveniently by adapting iterative maximum likelihood solu- 
tions due to Fisher, using “normalized scores.” 


LEROY 8. BRENNA and CLYDE Y. KRAMER. (Texaco 

614 Research Labs, Beacon, New York and Virginia Polytechnic 
Institute, Blacksburg, Virginia). Factorial Treatment Combina- 
tions in Lattices. 


Intra-block analysis and the analysis utilizing the recovery of inter- 
block information are presented for four types of lattice designs which 
incorporate factorial treatment combinations. The lattice designs con- 
sidered are repetitions of: the simple lattices, lattices having (k — 1) 
replications of the k’ treatments, a set of rectangular lattices where 
the number of treatments is nk (2 nk —1) with (k — 1) replications 
of each treatment, and cubic lattices. The three associate designs are 
three associate cubic designs and three associate rectangular designs. 

The adjusted treatment sum of squares for each design is obtained 
as a function of the treatment estimates. A two factor factorial is 
then introduced and tests of significance given. 

A chi-square statistic is obtained as a function of the treatment 
estimates for tests of significance when the treatment estimates are 
based on the utilization of recovery of inter-block information. A two- 


“| 
= | 
rg 
| 
d ph 
sh 
rs 
pt 
ed 
aie 
be 
res 
he 


634 BIOMETRICS, DECEMBER 1959 


factor factorial is introduced, and tests of significance given for the 
factorial treatment combinations. 

Contrasts for individual degree of freedom comparisons are given 
for both types of analysis for all designs considered. The modifications 
necessary for considering multi-factor factorials are included. The 
variances and covariances for factorial treatment combinations are 
provided for all types of designs considered. 

The lattice designs are developed using standard lattice notation. 
The three associate designs use the notation commonly used with 
partially balanced incomplete block designs. Only those designs in 
which independent sums of squares are available for main effects and 
interactions are presented. The limitations of lattice designs, relative 
to independent sums of squares for all effects, are discussed. 


THOMAS A. BUDNE (Statistical Engineering Consultant, Great 


“— Neck, New York). The Application of Random Balance Designs 


Experience has shown that Random Balance experiments can be — 


a very effective means for screening large numbers of variables in a 
limited number of samples to find the few which are largest contri- 
butors to the effects under consideration. In manufacturing, testing, 
and development areas, in particular, it is not at all unusual to find 
non-statistical technical personnel searching for the causes to large 
effects, among large numbers of variables. It is important that a 
relatively simple, effective, and economically feasible screening technique 
be made available to such personnel to use without the assistance of 
an expert statistician. The Random Balance Design is an answer 
to this problem. Since experience in industrial situations consistently 
shows that only a very few out of any large number of suspect variables 
are major contributors to a problem, the requirements for highly sensi- 
tive statistical designs are small and the Random Balance designs are 
particularly attractive. 

The most useful Random Balance designs are restricted random 
samples of full factorial designs. Full factorial and fractional factorial 
designs may be parts of the total design. A synthesized example illus- 
trates such a design and the techniques of analysis using “‘scatter plots” 
for each variable, non-parametric, and other tests of significance. 

Actual case histories in industrial situations are described. Where 
large effects of a variable can be determined with very few tests, the 
sequential nature of the analysis of test results permits the isolation of 
such a variable and the termination or modification of the experiment 
at that point. 
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JOSEPH L. CIMINERA, CLEMENT A. STONE, and 
JOHANNES IPSEN (Merck Sharp & Dohme, West Point, 
Pennsylvania). The Statistical Evaluation of a Biological Assay 
for Anti-Emetic Drugs.* 


616 


The use of Ipsen’s method for obtaining optimum appropriate scores 
in the evaluation of a biological assay for anti-emetic drugs is discussed. 
The method combines graded observations for quantal responses into 
one continuous score system. Evaluation based on a quantal response 
was shown to be only two-thirds as efficient as that based on a quanti- 
tative response using optimum scores. 


CLUNIES-ROSS, C. W. (Virginia Polytechnic Institute, Blacks- 
617 burg, Virginia). Some Effects on Wearout. 


The exponential distribution may be characterized by the fact that 
the conditional “failure” rate is constant, i.e., lifetimes are distributed 
as the waiting time for the initial disturbance from a Poisson process 
the parameter of which is constant. One method of allowing for (ir- 
reversible) wearout is to consider the underlying Poisson process as one 
whose parameter js an increasing function of time. 

This paper investigates two statistical properties of such wearout. 
One property is that the differences between ordered observations 
which, with suitable multipliers, are independent, and identically distri- 
buted for the exponential distribution now form a stochastically monoto- 
nic, non-independent sequence. Another property is that the standard 
deviation is Jess than the mean. 


L. C. A. CORSTEN (University of North Carolina, Chapel Hill, 
North Carolina). On Triangular Partially Balanced Incomplete 


Block Designs. 


The three eigenspaces and eigenvalues of NN’ where N is the in- 
cidence matrix of the triangular partially balanced incomplete block 
designs are derived from the association scheme and the parameters 
r, \, and A, . This knowledge provided Ogawa a condition for the 
existence of such designs. The solution of the normal equation for 
the vector of treatment effects which is largely determined by NN’ 
follows as a linear combination of the orthogonal projections of the 


*This paper has been published in the Journal of the American Pharmaceutical Association, Scientific 
Edition, Vol. XLVI, No. 3. March, [1957]. 
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vector of adjusted treatment sums on two of these eigenspaces. A 
now obvious reparametrization of the effect of the treatment in the 
ith row and the jth column of the association scheme as ¢; + t; + 4; , 
the sum of the effects of the ith and the jth row (or column), and an 
effect ¢;; orthogonal to those two-column effects—(row-column inter- 
action)—gives a physical meaning to the association classes; the vari- 
ances of estimable treatment contrasts reduce to a very simple form. 
Orthogonal treatment components which belong to one of the two 
eigenspaces can easily be tested in a quasi-independent way. Similar 
considerations simplify tests and variance estimation of orthogonal 
treatment components (e.g., effects of factorials) in balanced incomplete 
block, group divisible, and Latin-square type partially balanced in- 
complete block designs. 


DAVID B. DUNCAN (University of North Carolina, Chapel 


619 Hill, North Carolina). A Simple Minimum Average Risk Pro- 


cedure for the Multiple Comparisons Problem. 


Let [y: , --- , Ya » 8] be a sufficient estimator for [u, , --- , un, a} 
such that, to take a typical simple case, [y, , --- , y,] is normally dis- 
tributed with mean [u,; , --- , #a] and variance Jo’, and s° is the usual 
form of independent estimate (with » degrees of freedom) for o’, Let T 
represent the class of n(n — 1) differences T = {r:r = (u; — u;)/ 
V2 = 1, j}. The subset system of the multiple 
comparisons problem considered is that formed as the restricted pro- 


< 
duct of the two-decision component-problem subset pairs 7 > 0, 7 > 0 


for all reT. A Bayes solution of the form indicated in [1] Duncan, 
Ann. Math. Siat., 1958, p. 622, is developed for each component problem. 
Their simultaneous application, all 727’, is shown (see also, [2] Lehmann, 
Ann. Math. Stat., 1957, pp. 1-25) to be the Bayes solution to the given 
multiple comparisons problem with respect to a loss function formed 
as the sum of the component loss functions and to an a priori distri- 
bution function having the component a priori distribution functions 
at its margins. The table of ¢ in [1] is used for each component solution, 
the test statistic now also being a function of the residual variance 
among the y’s and having vy + n — 2 instead of » degrees of freedom. 
(Research jointly supported by the U. 8. Air Force through the Office 
of Scientific Research of the Air Research and Development Command, 
by the U. 8. Navy through the Office of Naval Research and by the 
U.S. Public Health Service). 
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FRED EDERER (National Cancer Institute, Bethesda, Mary- 
620 land). A Rapid Method for Estimating the Standard Error of 
a Survival Rate. 


People who compute survival rates by the life table method com- 
monly calculate the standard error of this rate from a formula de- 
veloped by Major Greenwood in 1926. Greenwood’s formula, although 
an approximation, requires considerable computation. lor example, 
computing the standard error of the 5-year survival rate requires five 
multiplications, five divisions, a summation of five terms, the extraction 
of a square root, and a multiplication. Where life tables and survival 
rates are generated in great numbers, as they frequently are with cancer 
registry data, computing standard errors becomes an overwhelming, 
if not prohibitive, task unless an electronic computer is accessible. 
Yet, survival rates cannot be effectively analyzed unless one has some 
notion of their reliability, which the standard error provides. Green- 
wood’s formula has another drawback: using it requires knowledge not 
only of the over-all survival rate, but of detailed life table data. The 
medical literature abounds with survival rates without standard errors, 
but the reader of medical literature usually cannot assess the reliability 
of published survival rates because detailed life table data are almost 
never published. 

These models assume that the rate at which cases enter observation 
is (1) uniform, (2) increasing at the rate of 5 percent per year, and 
that the mortality rate is (a) constant for each year of follow-up and 
(b) decreasing at a rate similar to that found in cancer patients. So- 
called ‘‘H-valves” have been computed for each of the models which 
make reasonably accurate and rapid estimation of the standard error 
of the survival rate possible. 

DONALD P. GAVER, JR. (Westinghouse Research Laboratories, 


- Pittsburgh, Pennsylvania). Interruptions in Waiting Lines. 


This paper studies stochastic processes associated with a single- 
server waiting-line system with stationary compound Poisson arrivals 
and general independent service times, where interruptions are permitted 
to occur in server availability. The interruption process is as follows: 
given that no interruption is in progress, the time to the appearance of 
the next interruption-producing event is exponentially distributed; 
interruption durations are independent with arbitrary distribution. 
Interruptions appearing during the service of an arrival either take 
effect instantly (preemptive) or are delayed to the end of the current 
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service time (postponed); interrupted services either take up from the 
interruption point (resume) or begin again from scratch (repeat). 

Such a process can be analyzed in terms of completion times, the 
latter being the durations of the periods that must elapse between the 
instant at which service of one arrival begins and that at which service 
of the next arrival may begin (does begin, except at the ends of busy 
periods). For the present process, successive completion times are 
independently and identically distributed. They can be used in place 
of service times in an imbedded Markov chain analysis of the waiting 
line of arrivals. Using an imbedded chain and renewal theory the 
following results are obtained: 

(i) Distributions of completion times for various server-interruption 
interactions are characterized as Laplace-Stieltjes transforms; the joint 
distributions of completion time and total associated interruption time 
is also thus characterized. Moments of completion time in terms of 
service and interruption processes are found. 

(ii) The distribution of the duration of a busy period is represented . 
as the solution of a functional equation, as are associated joint distri- 
butions. Moments are found. 

(iii) Let N(#) denote the number of arrivals waiting in line (including 
the one being served) at time ¢ after some initial instant. It is shown 
by renewal theory that, if a suitably defined traffic insensity parameter 
is less than unity, lim,... Pn[N(#) = n] = q, exists, and {q,} is a bona- 
fide probability distribution. For cases in which completion times 
terminate with the departure of an arrival the generating function of 
the above distribution is evaluated explicitly. 

The interruption-producing events may be caused by arrivals 
arbitrarily assigned service priority, in which case interruption dura- 
tions are the busy periods of the priority arrivals. The present set- 
up is thus sufficient to study the effect of introducing priority service 
upon the low priority arrival waiting line. 


ABRAHAM M. LILIENFELD (Division of Chronic Diseases, 
622 Johns Hopkins School of Hygiene and Public Health, Baltimore, 
Maryland). Epidemiological Methods. 


Epidemiology is concerned with the study of the distribution of a 
disease or condition in a population and of those factors that influence 
this distribution. Such knowledge is useful for the following reasons: 
1. Leads to the development of hypotheses concerning possible etiologic 
factors. 2. Permits testing of hypotheses developed in the clinic or 
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laboratory. 3. Provides the scientific bases for measures to control the 
disease. 

Epidemiologic studies provide data for the derivation of a series of 
statistical associations between a disease and various characteristics of 
the population. 

There are two general types of epidemiological data, each of which 

can be further subclassified, as follows: 1) General Population Studies: 
a) Routinely obtained statistics—death certificates, morbidity report- 
ing, etc., b) Special population morbidity surveys. 2) Individual 
History Data: a) Retrospective studies, b) Prospective studies, c) 
Experimental studies. 

The advantages and disadvantages of these various kinds of studies 
were discussed. 

From the pattern of the statistical associations derived from these 
studies, biological inferences with respect to etiology, modes of trans- 
mission of infectious disease, etc. are derived. 


NATHAN MANTEL and WILLIAM HAENSZEL (National 
623 Cancer Institute, Bethesda, Maryland). Statistical Aspects of 
the Analysis of Data from Retrospective Studies of Disease. 


The role and limitations of retrospective investigations of factors 
possibly associated with the occurrence of a disease are discussed and 
their relationship to forward-type studies emphasized. Examples of 
situations in which misleading associations could arise through the use 
of inappropriate control groups are presented. The possibility of mis- 
leading associations may be minimized by controlling or matching on 
factors which could produce such associations; the statistical analysis 
will then be modified. Statistical methodology is presented for analyzing 
retrospcctive study data, including chi-square measures of statistical 
significance of the observed association between the disease and the 
factor under study, and measures for interpreting the association in 
terms of an increased relative risk of disease. An extension of the 
chi-square test to the situation where data are subclassified by factors 
controlled in the analysis is given. A summary relative risk formula, 
R, is presented and discussed in connection with the problem of weighting 
the individual subcategory relative risk according to their importance 
or their precision. Alternative relative-risk formulas, R, , R, , R; , 
and R, , which require the calculation of subcategory-adjusted propor- 
tions of the study factor among diseased persons and controls for the 
computation of relative risks, are discussed. While these latter formulas 
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may be useful in many instances, they may be biased or inconsistent 
and are not, in fact, averages of the relative risks observed in the separate 
subcategories. Only the relative-risk formula, R, of those presented, 
can be viewed as such an average. The relationship of the matched- 
sample method to the subclassification approach is indicated. The 
statistical methodology presented is illustrated with examples from a 
study of women with epidemoid and undifferentiated pulmonary 
carcinoma. (Published in the Journal of the National Cancer Institute, 
22: 719-748, [1959]). 


624 J. W. PRATT (Harvard University, Cambridge, Massachusetts) 
Zeros and Ties in the Wilcoxon Signed Rank Test. 


A Wilcoxon one-sample (or matched pairs) signed rank test may 
be done when some of the observations (observed differences) are 0 
by dropping the 0’s before ranking. However, this may make a sample 
not significantly positive while a more negative sample (obtained by 
decreasing each observation equally), is significantly positive by the ~ 
ordinary Wilcoxon test. The reverse is also possible. Two-piece 
confidence regions result. This can be avoided by ranking the observa- 
tions including the 0’s, dropping the ranks of the 0’s and rejecting the 
null hypothesis if the sum of the remaining negative (or positive) ranks 
falls in the tail of its nul] distribution (given the number of 0’s). 

If observations are tied in absolute value, their ranks may be aver- 
aged before attaching signs. This changes the null distribution. A 
sample may be significantly positive which is not significant if the 
observations are increased (unequally), or if the ties are broken in any 
way. 


ALAN ROSS (University of Kentucky, Lexington, Kentucky). 


acs Variance Estimates in ‘‘Optimum” Sample Designs. 


Formulas describing optimum sample allocation in stratified and 
two-stage sample designs are given ordinarily in terms of “‘best’’ esti- 
mates of a population mean or total. The best allocation for estimating 
the variance of the mean may depart from that which is optimum for 
the mean. For stratified sampling, it is shown that such a departure 
depends upon variation among strata in functions of coefficients of 
kurtosis for the strata (fourth moment + squared second moment). 
For two-stage designs an equation for optimum sample allocation for 
estimating the variance of the mean is developed which depends upon 
coefficients of kurtosis as well as costs and variance components. Exami- 
nation of the expressions for optimum variance estimation and some 
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calculations for specific cases indicate that losses in efficiency of variance 
estimates due to using optimum allocation for a mean are not extreme 
in a variety of particular cases. 


HAROLD RUBEN (Columbia University, New York, New York). 


me Some New Results on the Bivariate Normal Integral. 


After a brief review of previous work on the bivariate normal integral, 
the paper shows that the bivariate normal integral may be expressed 
as the difference of the two probability contents, under a centered 
circular normal distribution of two infinite sectors with one arm of 
each sector passing through the center of the distribution. Asymptotic 
expansions as well as convergent continued fraction developments are 
derived for the probability contents of sectors of the above type, the 
latter formulae being in a sense bivariate generalizations of Laplace’s 
well-known continued fraction formula for Mill’s ratio. ‘The formulae 
are two-parameter formulae and are particularly useful and/or when 
the cut-off point in the original integral is not too near the center of the 
distribution, and also when the correlation coefficient is numerically 
large. 

It is further shown that in some situations the required probability 
may be expressed in terms of a single asymptotic expansion, as well 
as a single corresponding continued fraction, rather than in terms of 
the difference of two expansions or two continued fractions. 

Finally, expansions for the bivariate normal integral are obtained 
suitable for the case where the cut-off point is at » small moderate 
distance from the center of the distribution. 


JOHN W. TUKEY (Princeton University, Princeton, New Jersey 
627 and Bell Telephone Laboratories, Murrary Hill, New Jersey). 
Little Pieces of Mixed Factorials. 


If one experimental pattern can measure certain effects by applying 
¢ combinations of certain variables to ¢ trials, runs, or plots, divided 
into b blocks; if another experimental pattern can measure certain 
other effects by applying ¢#’ combinations of other variables to ¢’ trials, 
runs, or plots, and if ¢’ = b, then there is an experimental pattern which 
will measure both sets of effects in ¢ trials, runs, or plots. To see this 
it is only necessary to apply each treatment-combination of the second 
experiment to a block of the first experiment. If ¢’ # b, one can duplicate 
either blocks in the first experiment, or treatment combinations in the 
second, to make ¢’ = b. Thus the 8 main effects of a 2° can be measured 
in 16 trials divided into 8 blocks of two. (Take a saturated 2’ in 8 
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and “reflect”, as G. KE. P., Box and K. B. Wilson, J. Roy. Stat. Soc. 
B, 13 1-38, [1951], taking each original treatment combination and its 
reflection as a block.) When any one block is duplicated, these 8 main 
effects can be measured in 18 trials divided into 9 blocks of 2. A Greco- 
Latin square allows the measurement of the main effects of a 3* in 9 
trials. Combining, all the main effects of a 2°3* can be measured in 
18 trials. (16 DF for main effects, 1 DF for error.) Such patterns 
serve all the purposes which the saturated fractions of mixed factorials 
would serve if they existed: 

The saturated fractions of r" factorials (R. A. Fisher, Annals of 
Eugenics 12:) 283-290, [1945] can be manipulated to provide patterns 
of r* trials in r*~* blocks of size r. Thus the above construction can be 
extended to such little pieces as 81 trials for all main effects in a 375° 
or 162 trials for all main effects in a 2°°3°75°. (The latter is quite close 
to saturation, using 158 DF for main effects, and only 3 DF for error.) 
Many other possibilities of both more widely useful, and more extreme, 
numbers of trials exist. 


G. 8. WATSON (Princeton University, Princeton, New Jersey). 


- Dispersion on a Sphere. 


In this paper a survey is made of the methods for the statistical 
analysis of observations which are directions. Although the methods 
allow the directions to be in any number of dimensions the most im- 
portant application, palaeomagnetism, requires the theory only for three 
dimensions. For directions which are not widely dispersed about the 
mean direction it is often possible to treat this dispersion by similar 
methods arising in normal theory. In particular, tests that several 
populations have the same mean direction or that two populations have 
the same amount of dispersion may be made with good approximation 
by F-tests. This analogue is very helpful in suggesting analyses in 
more complicated problems. 
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WILFRED LESLIE STEVENS 


W. L. Stevens died suddenly in Sao Paulo, Brazil, on June 2, 1958. 

He was born in Bristol, England, on June 25, 1911, and after 
obtaining degrees in Physics and Mathematics from the universities of 
Reading (1930) and Cambridge (1934), held the following positions: 
assistant, Galton Laboratory, University of London; statistician, 
Rothamsted Experimental Station, England; professor of anthropology, 
University of Coimbra, Portugal; technical advisor, Elvas Experimental 
Station, Portugal; statistician, Imperial Chemical Industries Ltd., 
England; head of the Department of Econo: 1ic Statistics of the British 
Admiralty; professor of statistics, University of Séo Paulo, Brazil. 

His activities included participation in numerous committees and 
consulting functions in several organizations and government depart- 
ments, such as: Committee for the Calculation of Mathematical Tables 
of the British Association for the advancement of Science; Secretariat 
of Agriculture of the State of Sio Paulo, Brazil; Brazilian Coffee Insti- 
tute; Ministry of Agriculture of Uruguay. 

His always interested and obliging collaboration and advice 
were often requested by staff members and research workers from 
several departments of the University of Sao Paulo and other scientific 
institutes in Brazil. He also gave several courses on statistics in these 
departments and institutes, besides the regular courses in his own 
department. Among the scientific societies of which he was a member 
are the Institut International de Statistique and The Biometric Society, 
in which he served as Regional President for Brazil. 


CHANGES IN MEMBERSHIP 
(July 15-September 30, 1959) 
Changes of Address 
Miss Martha L. Agan, 11666 Kiowa Avenue, Los Angeles 49, California, 
U.S.A. 
Dr. David W. Alling, National Cancer Institute, Robin Building, 
National Institutes of Health, Bethesda 14, Maryland, U.S.A. 


Mr. Jack R. Borsting, Department of Mathematics and Mechanics, 
U.S. Naval Postgraduate School, Monterey, California, U.S.A. 
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Dr. Samuel H. Brooks, ¢/o General Analysis Corporation, 11753 Wilshire 
Boulevard, Los Angeles 25, California, U.S.A. 

Dr. Robert V. Brown, Medical Division, CDEE, Porton, Wittshire, 
England 

Mr. William C. Burrows, Soil and Water Conservation Research Labora- 
tory, Morris, Minnesota, U.S.A. 

Mr. Kenneth A. Busch, Robert A. Taft Sanitary Engineering Center, 
4676 Columbia Parkway, Cincinnati 26, Ohio, U.S.A. 

Dr. William 8. Connor, 3502 Manford Drive, Durham, North Carolina, 
U.S.A. 

Prof. Gertrude M. Cox, Research Triangle Institute, 505 West Chapel 
Hill Street, Durham, North Carolina, U.S.A. 

Mr. Jonas M. Dalton, 154 Morris Avenue, Summit, New Jersey, U.S.A. 

Mr. Richard J. Daum, 5504 42nd Avenue, Hyattsville, Maryland, U.S.A. 

Major H. H. Earle, Jr., Operations Research Group, Army Chemical 
Center, Maryland, U.S.A. 

Dr. Barton Roby Farthing, Louisana Agricultural Experiment Station, 
P. O. Box 8877, Baton Rouge 3, Louisiana, U.S.A. 

Mr. Charles F. Federspiel, School of Medicine, Vanderbilt University, 
Nashville 5, Tennessee, U.S.A. 

Mrs. Elsie D. Foard, 1815 W. Smallwood Drive, Raleigh, North 
Carolina, U.S.A. 

Dr. Werner Forster, Inst. f. Pharmakologie, Leipziger Str. 44, Magde- 
burg, Germany 

Dr. N. R. Fraser, Ley Road, St. James, Cape Town, South Africa 

Dr. Benson Ginsburg, University of Chicago, 5650 S. Hllis, Chicago 37, 
Illinois, U.S.A. 

Mr. Marvin Glasser, 44 Buswell Street, Boston, Massachusetts, U.S.A. 

Mr. Samuel W. Greenhouse, 1724 Ladd Street, Silver Spring, Maryland, 
U.S.A. 

Dr. Edward Cuyler Hammond, American Cancer Society, 521 West 
57th Street, New York 19, N. Y., U.S.A. 

Mr. M. J. R. Healy, Bell Telephone Laboratories, Murray Hill, New 
Jersey, U.S.A. 

Mr. David Hogben, Apt. 30, 102 Montgomery Street, Highland Park, 
New Jersey, U.S.A. 

Dr. T. W. Horner, Booz-Allen Applied Research, Inc., 4921 Auburn 
Avenue, Bethesda 14, Maryland, U.S.A. 

Mr. J. Edward Jackson, Conesus, New York, U.S.A. 

Mr. G. H. Jowett, Department of Statistics, University of nitions, 
Carlton N. 3, Victoria, Australia 

Dr. Leo Katz, Office of Naval Research Branch Office, Navy No. 100, 
Fleet Post Office, New York City, N. Y., U.S.A. 
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Dr. Therese Kelleher, Department of Genetics, North Carolina State 
College, Raleigh, North Carolina, U.S.A. 
Mr. William F. Kwolek, 3 Ann Street, New City, New York, U.S.A. 
Dr. Eschscholtzia L. Lucia, 2073 Seventh Avenue, Sacramento 18, 
California, U.S.A. 
Mr. Donald G. MacEachern, 100 Gloucester Street, Apt. 208, Toronto, 
Ontario, Canada 
Mr. Leonard A. Marascuilo, 514 66th Street, Oakland 9, California, 
US.A. 
Dr. Margaret Merrell, Shelburne, New Hampshire, U.S.A, 
Dr. K. R. Nair, Central Statistical Organization, B Barracks, Jan Path, 
New Delhi, India 
Dr. Peter H. Ovenburg, Department of Zoology, Michigan State Univer- 
sity, East Lansing, Michigan, U.S.A. 
Dr. G. I. Paul, Department of Actuarial Mathematics and Statistics, 
University of Manitoba, Winnipeg, Manitoba, Canada 
Dr. Edward B. Perrin, Graduate School of Public Health, University 
of Pittsburgh, Pittsburgh, Pennsylvania, U.S.A. 
Mr. O. W. Robison, Department of Animal Industry, North Carolina 
State College, Raleigh, North Carolina, U.S.A. 
Prof. Heihachi Sakamoto, Roka-Bunjyo-Danchi, No 6, Karasuyama- 
machi 1075, Setagawa-ku, Tokyo, Japan 
Dr. Francesco Sella, Room 3464 D, United Nations, New York, N. Y., 
USS.A. 
Miss Enes Barbara Taucci, 2225 A Cornell Street, Palo Alto, California, 
U.S.A. 
Dr. George W. Thomson, 15890 Sussex Avenue, Detroit 27, Michigan, 
U.S.A. 
Mr. A. M. W. Verhagen, School of Agriculture, C.S.I.R.O., Melbourne 
University, Carlton N. 3, Victoria, Australia 
Dr. Antonio M. Vilches, Pan American Sanitary Office, 1501 New 
Hampshire Avenue, N.W. Washington 6, D. C., U.S.A. 
Mr. Irving Weiss, 45 Maple Avenue, Andover, Massachusetts, U.S.A. 
Mr. Pao-lu Yu, 24 Highland Cross, Rutherford, New Jersey, U.S.A. 


New Members 

At Large 

Mr. Arnljot Hgyland, Department of Mathematics, University of Oslo, 
Blindern, Oslo, Norway 

Australasian Region 


Mr. I. *. Horton, 50 Hamilton Street, Corinda, Brisbane, Queensland, 
Australia 
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Belgian Region 


Dr. Robert Biemans, c/o Medecin Provincial, Service Medical 
Provincial, Leopoldville, Belgian Congo 

Mr. E. J. E. Buyckx, 22 rue de la Sablonniere, Brussels, Belgium 

Mr. Mare Dalebroux, 24 Rue du Moulin, Chatelineau, Belgium 


Brazilian Region 


Mr. Elizue R. de Andrade Alves, Caixa Postal 900, Bela Horizonte, 
Mineas Gerais, Brazil 

Mr. Benjamin Cintra, Av. Francisco Metarazzo 455, Sao Paulo, Brazil 

Mr. Manoel Felix da Silva, Escola de Agronomia do Nordeste, Areia, 
Paraiba, Brazil 

Mr. Fabio Ribeiro Gomes, Escola Sup. de Agriculture, Vicosa, Minas 
Gerais, Brazil 

Mr. Alvaro Marchi, Caixa Postal 8105, Sao Paulo, 8. P., Brazil 

Mr. Jose Rodolpho Torres, Escola Sup. de Agriculture, Vicosa, Minas 
Gerais, Brazil 


Eastern North American Region 


Dr. Fred C. Andrews, Department of Mathematics, University of 
Oregon, Eugene, Oregon, U.S.A. 

Mr. Ben Bereskin, 1008 Burnett, Ames, Iowa, U.S.A. 

Dr. Albert E. Drake, Research Data Analysis, Alabama Polytechnic 
Institute, Auburn, Alabama, U.S.A. 

Mr. A. Timothy Ewald, Department of Biophysics and Biometrics, 
Medical College of Virginia, Richmond 19, Virginia, U.S.A. 

Prof. Raymond I. Fields, Speed Scientific School, University of Louis- 
ville, Louisville 8, Kentucky, U.S.A. 

Dr. L. T. Higgins, 918 N.E. 17th, Oklahoma City 5, Oklahoma, U.S.A. 

Dr. Eileen B. Karsh, Yale University, 52 Hillhouse Avenue, New Haven, 
Connecticut, U.S.A. 

Mr. Richard Polk Lehman, Animal Husbandry Department, Virginia 
Polytechnic Institute, Blacksburg, Virginia, U.S.A. 

Prof. R. C. Lewontin, Department of Biology, University of Rochester, 
Rochester 20, New York, U.S.A. 

Mr. James E. Mangan, Jr., 2 Tremont Avenue, Binghamton, New York, 
U.S.A. 

Dr. John C. Mithoefer, Mary Imogene Bassett Hospital, Cooperstown, 
New York, U.S.A. 

Mr. Alan Ross, Medical Center, University of Kentucky, , Lexington, 
Kentucky, U.S.A. 
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Mr. William H. Sammons, 8714 63rd Avenue, Berwyn Heights, Mary- 
land, U.S.A. 

Mr. Paul Benjamin Siegel, Poultry Department, Virginia Polytechnic 
Institute, Blacksburg, Virginia, U.S.A. 

Mr. James W. Smith, 106 Polk Hall, North Carolina State College, 
Raleigh, North Carolina, U.S.A. 

Mr. Lloyd Dale Van Fleck, Wing Hall, Cornell University, Ithaca, 
New York, U.S.A. 

Mr. Lyle H. Wadell, Graduate Assistant, 31 Curtiss Hall, ISC, Ames, 
Towa, U.S.A. 

Mr. W. G. Warren, Department of Statistics, University of North 
Carolina, Chapel Hill, North Carolina, U.S.A. 


Italian Region 

Dr. Agostino Ciabattini, Via Savarna 189, Mezzano (Ravenna) Italy 

Dr. Mario Morea, Clinica Chirurgica, 8. Pietro 8, Sassari, Italy 

Japan 

Mr. Chooichiro Asano, 180 Mashitashima, Mishima-cho, Mishima-gun, 
Osaka, Japan 

Switzerland 


Dr. Hans Burla, Gockhausen (Zurich 44), Geerenackerstr. 7, Switzerland 


Western North American Region 


Dr. Norman N. Anderson, Department of Psychology, University of 
California, Los Angeles 24, California, U.S.A. 

Mr. Donald H. Hazelwood, Science Hall, State College of Washington, 
Pullman, Washington, U.S.A. 

Mr. Harold A. Hoffman, Department of Genetics, University of 
California, Berkeley 4, California, U.S.A. 

Dr. Lowell A. Woodbury, 248 University Street, Salt Lake City 2, 
Utah, U.S.A. 
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NEWS AND ANNOUNCEMENTS 


Members are invited to transmit to their National or Regional Secretary 
(if members at large, to the General Secretary) news of appointments, 
distinclions, or retirements, and announcements of professional interest. 


INTERNATIONAL FEDERATION OF GYNECOLOGY 
AND OBSTETRICS 


The Third World Congress of the International Federation of Gyne- 
cology and Obstetrics will take place on September 3-9, 1961, at Vienna 
(Austria), under the chairmanship of Professor Antoine. 

The two main subjects of the Congress will be: 1. Surgical treatment 
in gynecology and obstetrics. 2. The role of the pituitary gland in the 
physiology and pathology of genital organs. 

The scientific program will comprise 4-5 main conferences presented 
by the scientists and physicians not specialized in gynecology and 
obstetrics on subjects correlated with the main themes of the Congress. 
These subjects will be presented in a series of papers submitted, on 
invitation, by gynecologists and will be discussed in Round Table 
Conferences. The list of subjects and rapporteurs shall be established 
by the Committee on Scientific Program after consultation with the 
affiliated Societies. 

The scientific program will include a series of short papers of 8-10 
minutes’ duration, presented during simultaneous sessions. The short 
papers related to the main subjects of the Congress shall enjoy priority 
for admission to the program. The persons concerned will have to send 
a summary of their paper to their national Societies who will forward 
it with their recommendation to the President of the Scientific Program 
Committee, Professor Antoine, in Vienna. The Committce on Scientific 
Program shall make the fina] selection. 

Participation in the scientific program is restricted, in principle, 
to the members of Societies affiliated with the International l’ederation 
of Gynecology and Obstetrics. 


FELLOWSHIP AND RESEARCIL OPPORTUNITIES 


The Division of Mathematics, National Academy of* Sciences— 
National Research Council, calls attention to the fact that several 
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foundations and offices offer a number of fellowships as well as financial 
support for basic research in mathematics during the year 1960-61. 
A partial list, with addresses, is as follows: 

1. National Science Foundation, National Academy of Sciences— 
National Research Council, 2101 Constitution Avenue, Washing- 
ton 25, D. C. 

2. Office of Naval Research, Mathematics Branch, Office of Naval 
Research, Washington 25, D. C. 

3. Air Force Office of Scientific Research, Commander, Air Force 
Office of Scientific Research, Attention: Director of Mathematical 
Sciences, Washington 25, D. C. 

4. Office of Ordnance Research, U. 8S. Army, Commanding Officer, 
Office of Ordnance Research, Box CM, Duke Station, Durham, 
North Carolina. 

5. Fulbright Awards, Committee on International Exchange of 
Persons, Conference Board of Associated Research Councils, 2101 
Constitution Avenue, Washington 25, D. C. 

6. National Bureau of Standards, Naval Research Laboratory, Air 
Research and Development Command, Naval Ordnance Labora- 
tory, Navy Electronics Laboratory. (Post-doctoral resident 
research associateships tenable in Washington, D. C., Boulder, 
Colorado, and various laboratories of the agencies listed.) 

7. Atomic Energy Commission, Division of Research, Atomic Energy 
Commission, Washington 25, D. C. 

8. Brookhaven National Laboratory, M. I. Rose, Head, Applied 
Mathematics Division, Brookhaven National Laboratory, Upton, 
Long Island, New York. 

Other information on fellowship and research opportunities is given 
in the Bulletin, A Selected List of Major Fellowship Opportunities and 
Publications for Educational Support available from the Tellowship 
Office, National Academy of Sciences—National Research Council, 
2101 Constitution Avenue, Washington 25, D. C. 


VISITING MATHEMATICIANS - 


The Division of Mathematics, National Academy of Sciences —- 
National Research Council, announces the appearance of its annual 
list of Visiting Foreign Mathematicians. This bulletin includes informa- 
tion regarding mathematicians and statisticians spending some part of 
this academic year in the United States and gives the dates of their 
visit together with the host institution. Copies may be obtained by 
writing to the Division of Mathematics, National Academy of Sciences— 
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National Research Council, 2101 Constitution Avenue, Washington 25, 
D.C. 


NEWS ABOUT MEMBERS 


ENAR 


Sidney Addelman, Harold Baker, Shriniwas Katti, Scott Krane, 
Harold Larson, Jose Nieto and B. V. Shah were appointed associates 
and/or instructors of statistics at Iowa State University for the year 
1959-60. 

David W. Alling, formerly a graduate student in statistics at Cornell 
University, is now employed as a Medical Officer in the Therapeutic 
Trial Section, Cancer Chemotherapy National Service Center, National 
Institutes of Health, Bethesda, Maryland. 

Professors T. A. Bancroft and Oscar Kempthorne of Iowa State 
University and Professors E. C. Bryant and Robert White of the Uni- 
versity of Wyoming taught for the two-month duration of the 1959 
Summer Institute for College Teachers of Statistics, sponsored by the 
National Science Foundation and presented jointly by the University 
of Wyoming and Iowa State University at Laramie. There were 64 
participants representing colleges in 34 different states. Special weekly 
seminars were presented by Professors A. H. Bowker, W. E. Deming, 
Franklin Graybill, Morris Hansen, H. O. Hartley, D. V. Huntsberger, 
and H. O. Wold. 

Allan Birnbaum is presently Associate Professor of Mathematical 
Statistics at New York University. He points out that New York 
University’s Department of Mathematics now offers a Ph.D. program 
in statistics and probability. Dr. Birnbaum was awarded this year a 
Guggenheim Fellowship to be used in a future year for research and 
writing on a unified theory of estimation. 

William C. Burrows is Soil Physicist for the USDA-ARS at the 
North Central Soil and Water Conservation Research Station. 

William 8. Connor has left the National Bureau of Standards to 
take a position as Statistician with the Research Triangle Institute in 
Durham, North Carolina. 

C. Philip Cox, Head of the Statistics Section, The National Institute 
for Research in Dairying, Shinfield, England, is a Visiting Associate 
Professor of Statistics at Iowa State University for the academic year 
1959-60. 

Gertrude M. Cox has accepted the position of Head of the Statistics 
Research Division of the Research Triangle Institute in Durham, 
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North Carolina. Professor Cox still retains her post as Director of the 
Institute of Statistics at North Carolina State College. She will be 
working half time on both positions until, at least, July 1, 1960. 

Jonas M. Dalton is a member of the Technical Staff of the Bell 
‘Telephone Laboratories, Murray Hill, New Jersey. 

Richard J. Daum has taken the post of Analytical Statistician for 
the USDA at Beltsville, Maryland. 

William J. Dewey, formerly at the University of Minnesota, is 
presently employed as a Research Assistant in the Department of 
Medical Genetics at the University of Wisconsin in Madison. 

A. R. G. Emslie was appointed earlier this year as Director of the 
Animal Research Institute, Department of Agriculture, in Ottawa, 
Canada. 

Barton Roby Farthing has left the Department of Animal Industry 
at North Carolina State College to take the position of Experiment 
Station Statistician at Louisiana State University and A. and M. College 
in Baton Rouge, Louisiana. 

Mrs. Elsie D. Foard retired from her position of Statistician, ARS, 
Human Nutrition Research Division, U. 8. Department of Agriculture 
at Beltsville, Maryland on June 30, 1959. She is residing at present in 
Raleigh, North Carolina. 

Wayne Fuller was appointed Assistant Professor of Statistics at 
Iowa State University beginning September 1959. 

Marvin Glasser is a candidate for an Sc.D. in Biostatistics at the 
Harvard School of Public Health in Cambridge, Massachusetts. 

Edwin F. Grey is presently a Mathematical Statistician at the 
Headquarters, Middletown Air Material Area, Olmstead AFB in 
Middletown, Pennsylvania. He was formerly a Mathematical Statis- 
tician with the Quartermaster R and E Command. 

David Hogben is on leave of absence from his job as Quality Control 
Development Engineer for Western Electric Company. He is presently 
a Graduate Assistant at Rutgers University in New Jersey. 

Theodore W. Horner has taken the post of Senior Statistician with 
Booz-Allen Applied Research, Inc. in Bethesda, Maryland. Booz-Allen 
is involved in statistical research and consultation on governmental and 
industrial contracts. Dr. Horner was formerly Senior Operations Re- 
search Analyst at General Mills in Minneapolis. 

Rex L. Hurst is Visiting Professor at Iowa State University in Ames. 
He is on Sabbatical from Utah State University. 

Leo Katz has taken the position as Scientific Liaison Officer for the 
Department of Navy, Office of the Naval Research Branch Office in 
New York. He was formerly Professor and Head of the Department of 
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Statistics and Director of the Statistical Laboratory at Michigan State 
University in East Lansing. 

S. P..H. Mandel, Sydney University, Sydney, Australia, was 
appointed Assistant Professor of Statistics at Iowa State University 
beginning September 1959. 

Ethelyne L. McBee is teaching mathematics at the Westlake School 
for Girls in Los Angeles, California. 

G.I. Paul is presently Assistant Professor of Statistics in the Depart- 
ment of Actuarial Mathematics and Statistics at the University of 
Manitoba, Winnipeg, Canada. Dr. Paul was formerly Assistant Pro- 
fessor of Genetics at McGill University where he went after receiving 
his Ph.D. degree at North Carolina State College in Raleigh, North 
Carolina. 

Odis Wayne Robison has been appointed Assistant Professor in the 
Department of Animal Industry at North Carolina State College, 
Raleigh, North Carolina. : 

Vincent Schultz is on leave from the University of Maryland working 
as an ecologist-statistician with the Environmental Sciences Branch, 
Division of Biology and Medicine, U. 8. Atomic Energy Commission, 
Washington 25, D. C. The leave is for one year. 

Harry H. Shorey has taken the position of Junior Entomologist in 
the Department of Entomology, University of California, Riverside, 
California. 

Robert G. D. Steel has returned from one year at the Mathematical 
Research Center, U. S. Army, University of Wisconsin, Madison, 
Wisconsin, to his permanent position of Assistant Professor of Bio- 
logical Statistics at Cornell University, Ithaca, New York. 

Miss Enes Barbara Taucci is presently residing in Palo Alto, Cali- 
fornia while doing graduate work at Stanford University. 

Alan E. Treloar, formerly Director of the Hospital Research and 
Educational Trust, is now Chief, Statistics and Analysis Branch, Divi- 
sion of Research Grants, National Institutes of Health, Bethesda, 
Maryland. The Statistics and Analysis Branch is a new unit of the 
Division of Research Grants charged with continuing statistical analysis 
and appraisal of the total extramural grants-in-aid programs of the 
National Institutes of Health. 

Pao-Lo Yu has accepted the position of Statistician at the John 
L. Smith Memorial for Cancer Research, Charles Pfizer and Company 
in Rutherford, New Jersey. 

Dr. George Zyskind (Ph.D., ISU) was appvinted Assistant Professor 
of Statistics at Iowa State University beginning September 1959: 
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rence formulas 
checks, 46 
electronic, 113, 494 
machine, 494 
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simplified, 626 
Confidence circles, see joint confidence 
limits 
Confidence limits, 415, and see tests 
approximate, 230 
asymptotic, 3 
for ED 50, 424 
for path coefficient, 254 
for regression, 149 
intersection, 323, 488 
for systematic sampling, 271 
joint, 160, 164, 491, 494 
simultaneous, see joint 
Confounding, 68, 74, and see analysis of 
variance, comparisons, design of 
experiments, factorial experiments 
Consulting, statistical, 164 
Consumer testing, 582, 
organolepsis 
Contingency tables, 107, 150, 454, 538, 
582, 625, and see chi square, cor- 
relation, fourfold table 
Contrast, see comparisons 
Correlation, see causation, contingency, 
covariance, path coefficients, re- 


and sce 


gression 
analysis of, see path coefficients 
between relatives, see statistical 
genetics 


intraclass, 219, 418, 471, 514 
intrinsic, 469 
matrix, 584 
serial, 340 
spatial, 286 
Covariance, see correlation, 
regression, variance 
adjustment, 103 
analysis of, 44, 327, 486 
missing data, 486 
matrix, 93, 442, 618 
homogencity of, 394 
Cross-over design, see rotation 
extra-period, 116 
analysis of, 122 
Culling, see selection 
Curve fitting, see estimation, orthogonal 
polynomials, regression, smoothing 
Cybernetics, 162 
Cycles, 159, 242, and see rotation 


matrix, 


experiments 
spatial, 287 


Decision, see tests 
Defining contrast, 612 
Degrees of freedom, 32, and see com- 
parisons 
effective, 155 
loss of, see reduction of 
reduction of, 442 
single, 309, 328, 538, 587, 634 
Design of experiments, 405, 479, and see 
alias, augmented design, block, 
composite design, components of 
variance, confounding, cross-over 
design, desigu matrix, factorial 
experiments, fractional replication, 
half-leaf method, incomplete blocks, 
latin squares, lattices, long-term 
experiments, optional stopping, 
organolepsis, paired comparisons, 
randomization, randomized blocks, 
replication, rotation experiments, 
sample size needed, sampling, sensi- 
tivity of experiment, sequential 
experiments, triangle test, Youden 
squares 
cyclic, see cross-over, rotation 
extra-period, 116 
for locating optimum, 331, 557 
fractional, 332 
genetic, 142, 143, 219, 376, 417, 513, 
527 
organoleptic, 406 
random balance, 634 
theory, 157, 339 
two-phase, 60 
Determinantal equation, 448, and see 
matrix algebra 
Diagnosis, 338 
Dilution series, 1 
Direction, analysis of, 642 
Discriminant function, 165, 340, and 
see multivariate analysis, scores 
Distance, generalized, 165 
Distribution, see binomial, chi square, 
goodness of fit, moments, normal, 
Poisson 
approximate, 566 
asymptotic, 583 
conditional, 441 
discrete, 448 
exponential, 635 
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truncated, 7 
fitting, see estimation 
-free test, see test 
identified, 19 
joint, 206 
of rank total, 564 
of sample mean, 282 
sorting, 304 
spatial, see dispersion on sphere 
Wishart’s, 533 
Divergence, 317 
Dominance, 137, 145 
Dose-response curve, 316, 426, 493, 573, 
and see time-response curve 
D statistic, see generalized distance 
Dual, 258 
Ecology, 165, 489, 628, and see behavior, 
entomology, genetics, populations, 
sampling 
Economics, 17 
ED 50, 575, 606 
confidence limits, 424 
Edgeworth series, 466 
Efficiency, 51, 80, 101, 129, 259, and 
see estimation, maximum likeli- 
hood, sufficient statistics 
asympotic, 6, 435 
loss of, 307 
of estimation, 51, 70, 620 
of grouping, 433 
Eigenspace, 635 
Eigenvalue, 340, 635, and see matrix 
algebra 
Eigenvector, 340 
Endocrinology, 146, 148, 310, 334 
Entomology, 165, 604, and see toxicology 
Epidemiology, 162, 335, 496, 638 


Epistasis, 147, 530 
Equilibrium, 20 
stability of, 25 
Errata, 631 
Error, 
biased, 32, 124 


components of, 31 

experimental, 31 

mean square, see variance 
measurement, 237, 574 

regression, see regression 

rate, 560 

response, 237 

theory of, see analysis of variance, 
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least squares, maximum likeli- 
hood, models, rejection of data 
type I, 419, 443, 452 
type IT, 419 
Estimation, 153, 337, 435, 574, and see 
bias, combination of estimates, co- 
variance adjustment, efficiency, 
fitting constants, information, least 
squares, maximum likelihood, mini- 
mum variance, scores, Sheppard’s 
correction, stochastic approxima- 
tion, sufficient statistics 
consistent, 300 
distribution-free, 543 
errors correlated, 69 
inconsistent, 434 
inefficient, 4 
joint, 491 
nonlinear, 331 
of mean of Poisson, 324 
of quadratic response surface, 611 
of variance components, 71 
sequential, 551 
simultaneous, see joint 
Expectation, mathematical, see moments 
of mean square, 49 
Experimental design, see design of 
experiments 
Extreme deviate, 538 
Factor analysis, 248, 400, 496, and see 
multivariate analysis 
Factorial experiments, 332, 611, 633, 
and sce bioassay, fractional repli- 
cation 
analysis of, 38, 327, 334 
mixed, 641 
Feedback, 236, 250 
Fiducial limits, see confidence limits 
Field experiments, see agronomy, design 
of experiments, long-term experi- 
ments 
Fitting constants, 119, 192, and see 
least squares, missing values 
Fitting regression line, see covariance, 
least squares, regression both 
variables subject to error, 491 
Forestry, 409 
Fourfold table, 153, 441, and see chi 
square, contingency 
F test, see analysis of variance, beta 


£ 
| 


— 


INDIKX 


function 
range test 
Genetic advance, 311, 377 
Genetic correlation, 15, 142, 469, 520, 


multiple, see multiple 


and genetic covariance, 
heritability, path coefficients, 
repeatability 


Genetic covariance, 12, 418, 518, and 
see genetic correlation, genctic 
variance 

Genetic equilibrium, 20 

Genetic homeostasis, 137, and see 
homeostasis 

Genetic model, see models 

Genetic progress, 11, 192, 514 

maximum, 225, 516 

Genetics, 538, and see blood groups, 
chromosomes, dominance, epistasis, 
genetic, genotype-environment in- 
teraction, heritability, heterosis, 
inbreeding, linkage, loci, path co- 
efficients, repeatability 

human, 142 

nonadditive effects, 137, 138 
pleiotropy, 518 

population, 184, and see statistical 
statistical, 219, 521 

theoretical, 141, 158 

Genetic selection, 138, 140, 141, 147, 

159, 307 
equilibrium under, 20 
optimum family size, 376, 417, 513 
prediction, 146 
restricted, 10 

Genetic variance, 137, 141, 147, 338, 

418, 518 
components, 144, 221 

Geology, 642 

Goodness of fit, 440, and sce chi square 

Graeco-latin square, 61, 74, 642 

Grouping, 151, 433, 442 

Growth, 149, 151, 161, 165, and sce 
allometry, biometry, morphology, 
populations 

rate, 98 

Half-leaf method, 61 

Hematology, 156, and see blood groups 

Heredity, see genetics 

Heritability, 15, 141, 142, 143, 148, 513, 
and see genetic correlation, repeat- 
ability 


confidence limits for, 227 
estimation of, 417, 475 
distribution-free, 227 
variance of estimated, 224 
Hermite polynomials, 456 
Heterosis, negative, 146 
Homeostasis, 242, and see genetic 
Homogeneous coordinates, 164 
Horticulture, see argonomy 
Hotelling’s 7’, 390 
Hypergeometric, 299, 493 
Hypothesis, see models, tests 
null, 108, 406, 443, 561 
test of, 231, 385 
Identifiability, 
over-, 248 
under-, 248 
Immunology, 87 
Inbreeding, 140, 146 
Incomplete blocks, 38, and see Jattices, 
randomized blocks, Youden square 
analysis of, 544 
partially balanced, 62, 260 
triangular, 635 
row-balaneed, see incomplete latin 
square 
simple partially linked, 259 
Incomplete experiments, see missing 
values 
Inference, 248, and see tests 
Information, 100, 162, 248, 441, and see 
analysis of variance, design of ex- 
periments, estimation, maximum 
likelihood 
interblock, 60, 80, 263, 544, 633 
loss of, 38, 126 
matrix, 446 
theory, 625 
Interaction, 107, 309, 593, and see 
analysis of variance, models 
as error term, 32, 311, 414 
genotype-environment, 10, 137, 143, 
477 
interpretation of, 144 
Iteration, 72, 107, 203, 324, 450, 579, 
and see computation, least squares, 
maximum likelihood 
Judging, see organolepsis 
Kronecker product, see matrix, direct 
product 
Lagrange multipliers, 13 
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Lagrange series, 324 
Latent roots, 22, 165, 446, 585, and sec 
matrix algebra 
Latin squares, 33, 61, 116, 309, 339, 395 
incomplete, 116 
Lattices, 633 
rectangular, 333 
LD 50, see ED 50 
Learning, 
effects of, 389 
theory, 397 
Least squares, 44, 75, 102, 144, 192, 242 
and see adjusted mean, analysis of 
variance, estimation, fitting con- 
stants, matrix, normal equations 
weighted, 72 
Legendre polynomials, 444 
Life tables, 494, 635, 637 
Likelihood, 207, 589, and see maximum 
likelihood 
Likelihood-ratio, 441, 625 
Linkage, 139, 140, 145 
Loci, number of, 139, and see genetics 
Long-term experiments, 30 
Main effect, see analysis of variance, 
interaction 
Mann-Whitney U statistic, 228 
Market research, 337 
Markov, 638 
Matching, see balancing, paired com- 
parisons 
Mathematical biology, see biometry 
Matrix, 16, 57, 105, 309, and see co- 
variance matrix, determinantal 
equation, eigen-, information 
matrix, latent roots, multivariate 
analysis, quadratic forms 
algebra, 195 
design, 616 
direct product of, 147, 191, 309, 589 
inversion, 618 
Maximum likelihood, 1, 62, 72, 92, 151, 
203, 241, 300, 324, 433, 489, 577, 
589, 632, and see efficiency, esti- 
mation, information 
inappropriate, 1 
incomplete data, 326 
Mean, see moments 
adjusted, 76, and see least squares 
sample, 282 
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Medicine, 335, 489, 496, 577, and see 
blood groups, clinical, diagnosis, 
genetics, hematology, immunology, 
pharmacology, physiology, serology, 
toxicology, virology 

retrospective studies, 639 

Metameter, 576, and see scales 

Metric, see scales 

Microbiological assay, see bacteriology, 
bioassay 

Minimum variance estimate, 69, 100, 
229 

Missing values, 393, 486, and see fitting 
constants, rejection of data 

Model, see biometry, causation, 
components of variance, factor 
analysis, feedback, growth, hy- 
pothesis, missing values, path co- 
efficients, regression, transforma- 
tions 

analysis of variance model I, 335 
biological, 591 

genetic, 312, 417 

logistic, 490 

mathematical, 99, 157, 166, 193, 591 
mixed, 149, 389 

probability, 382 

selection of, 64, 88 

statistical, 298 

Moments, 567 

Monte Carlo, 156 

Morphology, 161 

Mortality, 87, 489, 496 

Moving averages, 424 

Multinomial distribution, 108, 150, 441, 
583 

confidence limits, 495 

Multivariate analysis, 149, 150, 330, 
335, 390, 416, 493, and see analysis 
of variance, determinantal equa- 
tion, discriminant function, factor 
analysis, generalized distance, 
matrix, principal component 

Newton’s method, 107 

Nonparametric tests, 560 

Normal distribution, 153, and see 


normality 
bivariate, 641 
cumulative, see probit transformation 
grouped, 433 


integrated, see probit transformation 
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Normal equations, 44, 105, and see 
least squares, matrix 
Normality, 
nonnormality, 155, 590 
Nutrition, 104 
Obituary, 360, 643 
Observational equations, 105 
Operating characteristic, 492 
Optional stopping, 555 
Ordering, 441, and see tests 
Organolepsis, 157, 298, 382, 492, and 
see scores, triangle test 
Orthogonal functions, 44 
Orthogonality, see comparisons 
nonorthogonality, 35, 561 
Orthogonal polynomials, 99, 466 
unequal intervals, 187 
Orthogonal squares, completely, 120 
Pairing, see balancing, paired compari- 
sons 
Palatability, see organolepsis 
Path coefficients, 236, and see causation 
Percentages, see proportions 
Perception, see organolepsis 
Periodicity, see cycles 
Pharmacology, 163, 591, and see 
bioassay, toxicology 
Physiology, 339, 594, and see cyber- 
netics, endocrinology, medicine, 
threshold 
Poisson distribution, 635, 637 
truncated, 324 
Populations, see distributions, statistical 
genetics 
dynamics, 166 
management of, see ecology 
Poultry, 
egg production, 145 
selection, 15, 143, 144 
Power, 336, 385, 441, and see tests 
function, 157, 300, 417 
Precision, see information, least squares 
Prediction, 491, 496, and see regression 
Preferences, see organolepsis 
Probability, 89, 299 
confidence limits for, 495 
generating function, 229 
geometrical, 158 
Programming, 
linear, 336 
nonlinear, 336 
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Proportions, see binomia! 
analysis of, 490 
comparison of, 87 
Psychology, 332, 389, 396, and see 
cybernetics, organolepsis, psycho- 
logical tests 
Quadratic forms, 442 
Quality, see organolepsis 
Quantification, see scales 
Queue, see stochastic processes 
Radiology, 337, 493, 577 
Randomization, 34, 74, 157, 414, and 
see bias, selection 
restricted, 340 
Randomized blocks, 39, see 
balancing, incomplete blocks 
Random process, see stochastie pro- 
cesses 
Random variable, 544 
Range, see tests using range 
Rank, see scores, transformations 
analysis of, 586 
sum test, 560 
tied, 563, 640 
zeros, 640 
Rate, see proportion 
Ratio, see proportion 
expected value, 557 
Recurrence formulas, 20, 188, 567 
Regression, see adjusted means, col- 
linearity, correlation, covariance, 
fitting regression line, orthogonal 
polynomials, prediction, trend 
analysis, 236, 307, and see analysis of 
covariance 
asymptotic, 52 
chain, 242 
Coefficient, variance of, 111 
differential, 326 
estimate, 551 
homogeneity of, see divergence 
independent variable, estimate of, 551 
intercept, 307 
intersection, 323 
multiple, 242, 309 
periodic, 159 
Rejection of data, 632, and see missing 
values, selection 
Reliability, 402 
system, 494 
Repeatability, 193, 514, and see genetic 
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correlation, heritability 
Replication, 30, 133, 629 
fractional, 33, 309, 332, 612, and see 
alias 
unequal, 486 
Residual, see error 
effect, 33, 116 
response, 52 
Response, see bioassay, dose-response, 
models, time-response 
error, 237 
graded, 598 
multinomial, 573 
quantal, 310, 591, 635 
semi-, 573 
surface, 331, 334 
quadratic, 611 
variance of, 621 
Rotation experiments, 30, and see 
cross-over 
unequal cycles, 33 
Sample size needed, 419, 479, 628, and 
see optional stopping, organolepsis 
Sample survey, see consumer testing, 
sampling 
Sampling, 157, and see components of 
variance, design of experiments, 
market research, sample size 
needed, sequential tests 
acceptance, 492 
area, 270 
error, see variance 
optimum, 640 
random, 272 
stratified, 640 
studies of statistical problems, 551, 
586, and see Monte Carlo 
systematic, 270 
variables, 492 
Scales, 405, 576, and see scores 
Scedasticity, 155 
Scores, 405, 590, and see discriminant 
function, organolepsis, ranks, scales 
Screening tests, see bioassay 
Selection, 192, and see balancing, design 
of experiments, genetic selection, 
organolepsis, randomization, re- 
jection of data, sampling 
index, 10, 469 
natural, 20, and see competition 
of comparisons, 588 
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of controls, 162 
of grouping interval, 443 
of group size, 219, 421 
of treatments, 153 
of values of independent variable, 333 
of variates, 416 
of weights, 70, 554 
restricted, 10 
truncation, 11 
Sensitivity data, see quantal response 
Sensitivity of experiment, 405 
Sensory tests, see organolepsis 
Sequential experiments, 318, 334, 551, 
612 
multivariate, 493 
Sheppard’s correction, 151, 436 
Significance, 336, and see selection 
Sign test, 496 
Simultaneous equations, see matrix 
Skewness, 466 
Smoothing, see moving averages 
Spatial, see correlation, cycles 
Sphere, dispersion on, 642 
Split plots, 39 
Standard deviation, see variance 
Standard error, see variance 
Statistical control, see analysis of 
covariance 
Statistics texts and periodicals, 181, 510 
Stirling’s approximation, 325 
Stochastic approximation, 551 
Stochastic processes, 158, 397, 562, 635, 
637, and see random variables 
Structural analysis, 236 
Student’s t, see ¢ test 
Subjective evaluation, see organolepsis 
Successive approximation, see iteration 
Sufficient statistics, 441, and see ef- 
ficiency, estimation 
Survival curve, see dose-response, time- 
response 
Survival time, see time-response 
Switchback, see cross-over 
Symmetry, 46 
Tables, miscellaneous, 23, 26, 120, 264, 
319, 379, 380, 410, 422, 428, 570, 
616 
graphical, 220, 381 
Target theory, 493, and see radiology 
Taste tests, see organolepsis 
Taxonomy, see morphology 
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Teaching of statistics, 182, 508 
Tests, see analysis of variance, Behrens- 
Fisher, chi square, comparisons, 
confidence limits, goodness of fit, 
likelihood ratio, null hypothesis, 
ordering, ranks, rejection of data, 
sequential, significance, sign test, 
triangle test, ¢ test 
asymptotically equivalent, 3 
conditional, 136 
distribution-free, 560 
exact, 452, 487 
multiple comparison, 560, 588 
multiple range, 495 
nonparametric, see nonparametric 
of significance, 154, 157 
of significance of 
difference between adjusted means, 
486 
difference between correlations, 
non-independent, 135 
difference between ED 50’s, 427 
difference between means, 486, 562 
difference of location, 562 
extreme deviate, 539 
largest difference, 
range test 
one-sided, 563 
power of, 385, 419 
psychological, 401 
unweighted means, 336 
using range, see multiple range test 
Wilcoxon, 640 
Theory, see biometry, hypothesis, model 
Threshold, distribution of, 492 
Time-response curve, 591, and see dose- 
response 
Time series, 340 
Tolerance, 593 
correlation of, 596 
Toxicology, 426, 592, and sce bioassay, 
pharmacology 
Transformations, 155, 160, 164, 330, 
593, and see additivity, analysis of 
variance, bioassay, discriminant 


see multiple 


function, matrix, model 
angular, 424, 540 
“Hadamard”, 330 
Helmert, 330 
inverse hyperbolic cosine, 330 
inverse sine, see angular 
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linear, 22 
logarithmic, 628 
logit, 490 
of beta function, 493 
of percentages, 155 
probability, 443 
probit, 11 
z, 135 
Trend, 53, and see regression 
plot, 33 
Triangle test, 300 
T test, 136, and see confidence limits, 
Ilotelling’s 7’, tests 
Tuberculosis, 92 
Uniformity data, 273 
U statistic, see Mann-Whitney 
Variance, see covariance, 
Sheppard’s correction 
analysis of, 30, and sce additivity, 
analysis of covariance, chi square, 
components of variance, degrees 
of freedom, error, fitting con- 
stants, interaction, least squares, 
long-term experiments, missing 
values, models, multivariate 
analysis, orthogonal polynomials, 
path coefficients, regression, struc- 
tural analysis, tests, transforma- 
tions, uniformity data 
computation of, 31, 42, 57, 66, 74 
interpretation of, 52, 68 
model I, 405 
model IT, 405, 418 
of two-phase experiment, 60 
power of, 340 
biased, 436 
components, 49, 141, 470, 515, and 
see structural analysis 
computation of, 61 
error, 414 
computation of, 36 
estimated, biased, 123, 436 
heterogeneity of, 61 
matrix, see covariance matrix 
of correlation, intraclass, 219 
of difference of means, 51, 78 
of estimate, 154 
of genetic correlation, 469 
of ratio, 483 
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of survival rate, 637 Weighting, 60, 144, 155, 312, 466, 543, 
ratio, see beta function 553 

noncentral, 406, 420 random, 543 

Viability, 20 X-rays, see radiology 

Virology, 1, 61, and see half-leaf method Youden squares, 393, and sce incom- 

Wear curve, 635 plete latin squares 
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be printed in News and Announcements should also be submitted doublespaced 
and in duplicate. 


Sustarninc MEMBERS OF THE BIOMETRIC SociETy 


Abbott Laboratories 

American Cancer Society, Inc. 
Heisdorf and Nelson Farms, Inc. 
Merck, Sharp and Dohme Research Laboratories 
Schering Corporation 

Smith, Kline and French Laboratories 

E. R. Squibb and Sons 

Wallace Laboratories, Division of Carter Products 
Wyeth Institute of Applied Biochemistry 
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BACK ISSUES 


Back issues of Biometrics are available at the following postage-paid 
prices in U.S.A. currency: 


Price per Price per 
Year Volume Number __ Single Number Volume(unbound) 
1945 1 1 to6 $1.00 $6.00 
1946 2 1to6 1.00 6.00 
1947 3 lto4 1.50 5.00 
1948 4 lto4 1.50 5.00 
1949 5 lto4 1.50 5.00 
1950 6 1to4 1.50 5.00 
1951 7 1to4 2.00 8.00 
1952 8 lto4 2.00 8.00 
1953 2.00 8.00 
1954 § #10 lto4 2.00 8.00 
1955 ll lto4 2.00 8.00 
1956 12 1to4 2.00 8.00 
1957 13 1to4 2.00 8.00 
1958 14 lto4 2.00 8.00 
1959 15 lto4 2.00 8.00 


Reprints of individual articles are not available except to authors at the 
time of printing. Three special issues are among the numbers listed 
above. They are: 

1947 Volume 3 Number 1 The Analysis of Variance 


1951 Volume 7 Number 1 Components of Variance 
1957 Volume 13 Number 3 The Analysis of Covariance 


Also available are: 
Fishery Reprint Series (Selected reprints from Vol. 5) $1.00 
Subject Index (Volumes 1-10) 1.00 
Proceedings, International Biometric Symposium, 
Campinas, Brazil, 1955. 1.00 


Inquiries, non-member subscriptions, and orders for back issues and 
other material listed above should be addressed to: Biomerrics, DEPART- 
mEnT oF Statistics, Tae Firorma State Untversiry, TALLAHASSEE, 
Frorma, U.S.A. 
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